subject:"\[HACKERS\] Declarative partitioning"

Re: [HACKERS] Declarative partitioning - another take

2017-06-29 Thread Etsuro Fujita


Hi Maksim,

On 2017/04/07 19:52, Maksim Milyutin wrote:

On 07.04.2017 13:05, Etsuro Fujita wrote:



On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;



There seems to be no work on the first one, so I'd like to work on that.



Yes, you can start to work on this, I'll join later as a reviewer.


Great!  I added the patch to the next commitfest:

https://commitfest.postgresql.org/14/1184/

Sorry for the delay.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-10 Thread Robert Haas

On Wed, May 10, 2017 at 6:50 AM, Etsuro Fujita
 wrote:
> I started working on this.  I agree that the changes made in
> make_modifytable would be unacceptable, but I'd vote for Amit's idea of
> passing a modified PlannerInfo to PlanForeignModify so that the FDW can do
> query planning for INSERT into a foreign partition in the same way as for
> INSERT into a non-partition foreign table.  (Though, I think we should
> generate a more-valid-looking working-copy of the PlannerInfo which has
> Query with the foreign partition as target.)  I'm not sure it's a good idea
> to add a new FDW API or modify the existing one such as PlanForeignModify
> for this purpose.

Thanks for working on this, but I think it would be better to start a
new thread for this discussion.

And probably also for any other issues that come up.  This thread has
gotten so long that between the original discussion and commit of the
patch and discussion of multiple follow-on commits and patches that
it's very hard to follow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-10 Thread Etsuro Fujita


On 2016/11/18 1:43, Robert Haas wrote:

On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote
 wrote:



- The code in make_modifytable() to swap out the rte_array for a fake
one looks like an unacceptable kludge.  I don't know offhand what a
better design would look like, but what you've got is really ugly.



Agree that it looks horrible.  The problem is we don't add partition
(child table) RTEs when planning an insert on the parent and FDW
partitions can't do without some planner handling - planForeignModify()
expects a valid PlannerInfo for deparsing target lists (basically, to be
able to use planner_rt_fetch()).



If it's only needed for foreign tables, how about for v1 we just throw
an error and say errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot route inserted tuples to a foreign table") for now.  We
can come back and fix it later.  Doing more inheritance expansion



Coming up with some new FDW API or some modification
to the existing one is probably better, but I don't really want to get
hung up on that right now.


I started working on this.  I agree that the changes made in 
make_modifytable would be unacceptable, but I'd vote for Amit's idea of 
passing a modified PlannerInfo to PlanForeignModify so that the FDW can 
do query planning for INSERT into a foreign partition in the same way as 
for INSERT into a non-partition foreign table.  (Though, I think we 
should generate a more-valid-looking working-copy of the PlannerInfo 
which has Query with the foreign partition as target.)  I'm not sure 
it's a good idea to add a new FDW API or modify the existing one such as 
PlanForeignModify for this purpose.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Amit Langote

On 2017/05/10 12:59, Robert Haas wrote:
> On Tue, May 9, 2017 at 11:54 PM, Thomas Munro
>  wrote:
>> +A statement that targets a parent table in a inheritance or partitioning
>>
>> A tiny typo: s/a inheritance/an inheritance/
> 
> Now he tells me.

Thanks both.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Robert Haas

On Tue, May 9, 2017 at 11:54 PM, Thomas Munro
 wrote:
> +A statement that targets a parent table in a inheritance or partitioning
>
> A tiny typo: s/a inheritance/an inheritance/

Now he tells me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Thomas Munro

On Wed, May 10, 2017 at 3:51 PM, Robert Haas  wrote:
> On Sun, May 7, 2017 at 9:44 PM, Amit Langote
>  wrote:
>> I think that makes sense.  Modified it to read: "A statement that targets
>> a parent table in a inheritance or partitioning hierarchy..." in the
>> attached updated patch.
>
> LGTM.  Committed.

+A statement that targets a parent table in a inheritance or partitioning

A tiny typo: s/a inheritance/an inheritance/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-09 Thread Robert Haas

On Sun, May 7, 2017 at 9:44 PM, Amit Langote
 wrote:
> I think that makes sense.  Modified it to read: "A statement that targets
> a parent table in a inheritance or partitioning hierarchy..." in the
> attached updated patch.

LGTM.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-07 Thread Amit Langote

On 2017/05/08 10:22, Thomas Munro wrote:
> On Mon, May 8, 2017 at 12:47 PM, Amit Langote
>  wrote:
>> On 2017/05/03 2:48, Robert Haas wrote:
>>> On Tue, May 2, 2017 at 3:30 AM, Amit Langote
>>>  wrote:
 You're right.  I agree that whatever text we add here should be pointing
 out that statement-level triggers of affected child tables are not fired,
 when root parent is specified in the command.

 Since there was least some talk of changing that behavior for regular
 inheritance so that statement triggers of any affected children are fired
 [1], I thought we shouldn't say something general that applies to both
 inheritance and partitioning.  But since nothing has happened in that
 regard, we might as well.

 How about the attached?
>>>
>>> Looks better, but I think we should say "statement" instead of
>>> "operation" for consistency with the previous paragraph, and it
>>> certainly shouldn't be capitalized.
>>
>> Agreed, done.  Attached updated patch.
> 
> 
> +A statement that targets the root table in a inheritance or partitioning
> +hierarchy does not cause the statement-level triggers of affected child
> +tables to be fired; only the root table's statement-level triggers are
> +fired.  However, row-level triggers of any affected child tables will be
> +fired.
> +   
> +
> +   
> 
> Why talk specifically about the "root" table?  Wouldn't we describe
> the situation more generally if we said [a,the] "parent"?

I think that makes sense.  Modified it to read: "A statement that targets
a parent table in a inheritance or partitioning hierarchy..." in the
attached updated patch.

Thanks,
Amit
>From 5c2c453235d5eedf857a5e7123337aae5aedd761 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Clarify statement trigger behavior with inheritance

---
 doc/src/sgml/trigger.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 6f8416dda7..ce76a1f042 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -123,6 +123,14 @@

 

+A statement that targets a parent table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the parent table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   
 If an INSERT contains an ON CONFLICT
 DO UPDATE clause, it is possible that the effects of all
 row-level BEFORE INSERT triggers
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-07 Thread Thomas Munro

On Mon, May 8, 2017 at 12:47 PM, Amit Langote
 wrote:
> On 2017/05/03 2:48, Robert Haas wrote:
>> On Tue, May 2, 2017 at 3:30 AM, Amit Langote
>>  wrote:
>>> You're right.  I agree that whatever text we add here should be pointing
>>> out that statement-level triggers of affected child tables are not fired,
>>> when root parent is specified in the command.
>>>
>>> Since there was least some talk of changing that behavior for regular
>>> inheritance so that statement triggers of any affected children are fired
>>> [1], I thought we shouldn't say something general that applies to both
>>> inheritance and partitioning.  But since nothing has happened in that
>>> regard, we might as well.
>>>
>>> How about the attached?
>>
>> Looks better, but I think we should say "statement" instead of
>> "operation" for consistency with the previous paragraph, and it
>> certainly shouldn't be capitalized.
>
> Agreed, done.  Attached updated patch.


+A statement that targets the root table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the root table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   

Why talk specifically about the "root" table?  Wouldn't we describe
the situation more generally if we said [a,the] "parent"?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-07 Thread Amit Langote

On 2017/05/03 2:48, Robert Haas wrote:
> On Tue, May 2, 2017 at 3:30 AM, Amit Langote
>  wrote:
>> You're right.  I agree that whatever text we add here should be pointing
>> out that statement-level triggers of affected child tables are not fired,
>> when root parent is specified in the command.
>>
>> Since there was least some talk of changing that behavior for regular
>> inheritance so that statement triggers of any affected children are fired
>> [1], I thought we shouldn't say something general that applies to both
>> inheritance and partitioning.  But since nothing has happened in that
>> regard, we might as well.
>>
>> How about the attached?
> 
> Looks better, but I think we should say "statement" instead of
> "operation" for consistency with the previous paragraph, and it
> certainly shouldn't be capitalized.

Agreed, done.  Attached updated patch.

Thanks,
Amit
>From 1d7e383c6d89ebabacc7aa3f7d9987779daaa4fb Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Clarify statement trigger behavior with inheritance

---
 doc/src/sgml/trigger.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 6f8416dda7..89e8c01a71 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -123,6 +123,14 @@

 

+A statement that targets the root table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the root table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   
 If an INSERT contains an ON CONFLICT
 DO UPDATE clause, it is possible that the effects of all
 row-level BEFORE INSERT triggers
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-02 Thread Robert Haas

On Tue, May 2, 2017 at 3:30 AM, Amit Langote
 wrote:
> You're right.  I agree that whatever text we add here should be pointing
> out that statement-level triggers of affected child tables are not fired,
> when root parent is specified in the command.
>
> Since there was least some talk of changing that behavior for regular
> inheritance so that statement triggers of any affected children are fired
> [1], I thought we shouldn't say something general that applies to both
> inheritance and partitioning.  But since nothing has happened in that
> regard, we might as well.
>
> How about the attached?

Looks better, but I think we should say "statement" instead of
"operation" for consistency with the previous paragraph, and it
certainly shouldn't be capitalized.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-02 Thread Amit Langote

On 2017/05/01 21:30, Robert Haas wrote:
> On Mon, May 1, 2017 at 12:18 AM, Amit Langote
>  wrote:
>> Attached updated patch.
> 
> Committed, except for this bit:

Thanks.

> +A statement-level trigger defined on partitioned tables is fired only
> +once for the table itself, not once for every table in the partitioning
> +hierarchy.  However, row-level triggers of any affected leaf partitions
> +will be fired.
> 
> The first sentence there has a number of issues.  Grammatically, there
> is an agreement problem: trigger is singular, but partitioned table is
> plural, and one trigger isn't defined across multiple tables.  It
> would have to say something like "A statement-level trigger defined on
> a partitioned table".  But even with that correction, it's not really
> saying what you want it to say.  Nobody would expect that the same
> statement-level trigger would be fired multiple times.  The issue is
> whether statement-level triggers on the partitions themselves will be
> fired, not the statement-level trigger on the partitioned table.
> Also, if this applies to inheritance as well as partitioning, then why
> mention only partitioning?  I think we might want to document
> something more here, but not like this.

You're right.  I agree that whatever text we add here should be pointing
out that statement-level triggers of affected child tables are not fired,
when root parent is specified in the command.

Since there was least some talk of changing that behavior for regular
inheritance so that statement triggers of any affected children are fired
[1], I thought we shouldn't say something general that applies to both
inheritance and partitioning.  But since nothing has happened in that
regard, we might as well.

How about the attached?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/cd282adde5b70b20c57f53bb9ab75...@biglumber.com
>From d2d27ca458fbd341efd5ae231f977385a5e3c951 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Clarify statement trigger behavior with inheritance

---
 doc/src/sgml/trigger.sgml | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 6f8416dda7..ef41085744 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -123,6 +123,14 @@

 

+An Operation that targets the root table in a inheritance or partitioning
+hierarchy does not cause the statement-level triggers of affected child
+tables to be fired; only the root table's statement-level triggers are
+fired.  However, row-level triggers of any affected child tables will be
+fired.
+   
+
+   
 If an INSERT contains an ON CONFLICT
 DO UPDATE clause, it is possible that the effects of all
 row-level BEFORE INSERT triggers
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-05-01 Thread Robert Haas

On Mon, May 1, 2017 at 12:18 AM, Amit Langote
 wrote:
> Attached updated patch.

Committed, except for this bit:

+A statement-level trigger defined on partitioned tables is fired only
+once for the table itself, not once for every table in the partitioning
+hierarchy.  However, row-level triggers of any affected leaf partitions
+will be fired.

The first sentence there has a number of issues.  Grammatically, there
is an agreement problem: trigger is singular, but partitioned table is
plural, and one trigger isn't defined across multiple tables.  It
would have to say something like "A statement-level trigger defined on
a partitioned table".  But even with that correction, it's not really
saying what you want it to say.  Nobody would expect that the same
statement-level trigger would be fired multiple times.  The issue is
whether statement-level triggers on the partitions themselves will be
fired, not the statement-level trigger on the partitioned table.
Also, if this applies to inheritance as well as partitioning, then why
mention only partitioning?  I think we might want to document
something more here, but not like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-30 Thread Amit Langote

On 2017/04/29 6:56, David Fetter wrote:
> On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote:
>> On 2017/04/28 7:36, David Fetter wrote:
>>> Did I notice correctly that there's no way to handle transition tables
>>> for partitions in AFTER STATEMENT triggers?
>>
>> Did you mean to ask about AFTER STATEMENT triggers defined on
>> "partitioned" tables?  Specifying transition table for them is disallowed
>> at all.
>>
>> ERROR:  "p" is a partitioned table
>> DETAIL:  Triggers on partitioned tables cannot have transition tables.
> 
> OK, I suppose.  It wasn't clear from the documentation.
> 
>> Triggers created on (leaf) partitions *do* allow specifying transition table.
> 
> That includes the upcoming "default" tables, I presume.

If a "default" table is also a "leaf" table, then yes.  A default
table/partition can also be itself a partitioned table, in which case, it
won't allow triggers that require transition tables.  AFAICS, it's the
table's being partitioned that stops it from supporting transition tables,
not whether it is a "default" partition or not.

>> Or are you asking something else altogether?
> 
> I was just fuzzy on the interactions among these features.
> 
>>> If not, I'm not suggesting that this be added at this late date, but
>>> we might want to document that.
>>
>> I don't see mentioned in the documentation that such triggers cannot be
>> defined on partitioned tables.  Is that what you are saying should be
>> documented?
> 
> Yes, but I bias toward documenting a lot, and this restriction could
> go away in some future version, which would make things more confusing
> in the long run.

Yeah, it would be a good idea to document this.

> I'm picturing a conversation in 2020 that goes
> something like this:
> 
> "On 10, you could have AFTER STATEMENT triggers on tables, foreigh
> tables, and leaf partition tables which referenced transition tables,
> but not on DEFAULT partitions.  On 11, you could on DEFAULT partition
> tables.  From 12 onward, you can have transition tables on any
> relation."

What we could document now is that partitioned tables don't allow
specifying triggers that reference transition tables.  Although, I am
wondering where this note really belongs - the partitioning chapter, the
triggers chapter or the CREATE TRIGGER reference page?  Maybe, Kevin and
Thomas have something to say about that.  If it turns out that the
partitioning chapter is a good place, here is a documentation patch.

Thanks,
Amit
>From 289b4d906abb50451392d0efe13926f710952ca0 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 1 May 2017 13:46:58 +0900
Subject: [PATCH] Document transition table trigger limitation of partitioned
 tables

---
 doc/src/sgml/ddl.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 84c4f20990..5f5a956d41 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3293,7 +3293,9 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
  
   
Row triggers, if necessary, must be defined on individual partitions,
-   not the partitioned table.
+   not the partitioned tables.  Also, triggers that require accessing
+   transition tables are not allowed to be defined on the partitioned
+   tables.
   
  
 
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-30 Thread Amit Langote

Thanks for the review.

On 2017/04/29 1:25, Robert Haas wrote:
>  On Fri, Apr 28, 2017 at 2:13 AM, Amit Langote
>  wrote:
 By the way, code changes I made in the attached are such that a subsequent
 patch could implement firing statement-level triggers of all the tables in
 a partition hierarchy, which it seems we don't want to do.  Should then
 the code be changed to not create ResultRelInfos of all the tables but
 only the root table (the one mentioned in the command)?  You will see that
 the patch adds fields named es_nonleaf_result_relations and
 es_num_nonleaf_result_relations, whereas just es_root_result_relation
 would perhaps do, for example.
>>>
>>> It seems better not to create any ResultRelInfos that we don't
>>> actually need, so +1 for such a revision to the patch.
>>
>> OK, done.  It took a bit more work than I thought.
> 
> So, this seems weird, because rootResultRelIndex is initialized even
> when splan->partitioned_rels == NIL, but isn't actually valid in that
> case.  ExecInitModifyTable seems to think it's always valid, though.

OK, rootResultRelIndex is now set to a value >= 0 only when the node
modifies a partitioned table.  And then ExecInitModifyTable() checks if
the index is valid before initializing the root table info.

> I think the way that you've refactored fireBSTriggers and
> fireASTriggers is a bit confusing.  Instead of splitting out a
> separate function, how about just having the existing function begin
> with if (node->rootResultRelInfo) resultRelInfo =
> node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ?
> I think the way you've coded it is a holdover from the earlier design
> where you were going to call it multiple times, but now that's not
> needed.

OK, done that way.

> Looks OK, otherwise.

Attached updated patch.

Thanks,
Amit
>From 53c964f9f1fd3e27824080b0e9e5b672fa53cdf0 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 +++
 src/backend/executor/execMain.c | 55 --
 src/backend/executor/nodeModifyTable.c  | 42 +
 src/backend/nodes/copyfuncs.c   |  2 +
 src/backend/nodes/outfuncs.c|  3 ++
 src/backend/nodes/readfuncs.c   |  2 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/planner.c|  3 ++
 src/backend/optimizer/plan/setrefs.c| 19 ++--
 src/include/nodes/execnodes.h   | 12 +
 src/include/nodes/plannodes.h   | 13 +-
 src/include/nodes/relation.h|  1 +
 src/test/regress/expected/triggers.out  | 81 +
 src/test/regress/sql/triggers.sql   | 70 
 14 files changed, 303 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and
+foreign tables.  Row-level INSTEAD OF triggers may o

Re: [HACKERS] Declarative partitioning - another take

2017-04-30 Thread Noah Misch

On Mon, Apr 24, 2017 at 07:43:29PM +0900, Amit Langote wrote:
> On 2017/04/21 17:00, Rajkumar Raghuwanshi wrote:
> > I am able to create statement triggers at root partition, but these
> > triggers, not getting fired on updating partition.

> It would be great if you could check if the patches fix the issues.
> 
> Also, adding this to the PostgreSQL 10 open items list.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread David Fetter

On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote:
> On 2017/04/28 7:36, David Fetter wrote:
> > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
> >> On 2017/04/27 1:52, Robert Haas wrote:
> >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
> >>>  wrote:
>  FWIW, I too prefer the latter, that is, fire only the parent's triggers.
>  In that case, applying only the patch 0001 will do.
> >>>
> >>> Do we need to update the documentation?
> >>
> >> Yes, I think we should.  How about as in the attached?
> >>
> >> By the way, code changes I made in the attached are such that a subsequent
> >> patch could implement firing statement-level triggers of all the tables in
> >> a partition hierarchy, which it seems we don't want to do.  Should then
> >> the code be changed to not create ResultRelInfos of all the tables but
> >> only the root table (the one mentioned in the command)?  You will see that
> >> the patch adds fields named es_nonleaf_result_relations and
> >> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> >> would perhaps do, for example.
> > 
> > Did I notice correctly that there's no way to handle transition tables
> > for partitions in AFTER STATEMENT triggers?
> 
> Did you mean to ask about AFTER STATEMENT triggers defined on
> "partitioned" tables?  Specifying transition table for them is disallowed
> at all.
> 
> ERROR:  "p" is a partitioned table
> DETAIL:  Triggers on partitioned tables cannot have transition tables.

OK, I suppose.  It wasn't clear from the documentation.

> Triggers created on (leaf) partitions *do* allow specifying transition table.

That includes the upcoming "default" tables, I presume.

> Or are you asking something else altogether?

I was just fuzzy on the interactions among these features.

> > If not, I'm not suggesting that this be added at this late date, but
> > we might want to document that.
> 
> I don't see mentioned in the documentation that such triggers cannot be
> defined on partitioned tables.  Is that what you are saying should be
> documented?

Yes, but I bias toward documenting a lot, and this restriction could
go away in some future version, which would make things more confusing
in the long run.  I'm picturing a conversation in 2020 that goes
something like this:

"On 10, you could have AFTER STATEMENT triggers on tables, foreigh
tables, and leaf partition tables which referenced transition tables,
but not on DEFAULT partitions.  On 11, you could on DEFAULT partition
tables.  From 12 onward, you can have transition tables on any
relation."

Kevin?  Thomas?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Robert Haas

 On Fri, Apr 28, 2017 at 2:13 AM, Amit Langote
 wrote:
> It seems to me that there is no difference in behavior between
> inheritance-based and declarative partitioning as far as statement-level
> triggers are concerned (at least currently).  In both the cases, we fire
> statement-level triggers only for the table specified in the command.

OK.

>>> By the way, code changes I made in the attached are such that a subsequent
>>> patch could implement firing statement-level triggers of all the tables in
>>> a partition hierarchy, which it seems we don't want to do.  Should then
>>> the code be changed to not create ResultRelInfos of all the tables but
>>> only the root table (the one mentioned in the command)?  You will see that
>>> the patch adds fields named es_nonleaf_result_relations and
>>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>>> would perhaps do, for example.
>>
>> It seems better not to create any ResultRelInfos that we don't
>> actually need, so +1 for such a revision to the patch.
>
> OK, done.  It took a bit more work than I thought.

So, this seems weird, because rootResultRelIndex is initialized even
when splan->partitioned_rels == NIL, but isn't actually valid in that
case.  ExecInitModifyTable seems to think it's always valid, though.

I think the way that you've refactored fireBSTriggers and
fireASTriggers is a bit confusing.  Instead of splitting out a
separate function, how about just having the existing function begin
with if (node->rootResultRelInfo) resultRelInfo =
node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ?
I think the way you've coded it is a holdover from the earlier design
where you were going to call it multiple times, but now that's not
needed.

Looks OK, otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote

On 2017/04/28 7:36, David Fetter wrote:
> On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
>> On 2017/04/27 1:52, Robert Haas wrote:
>>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
>>>  wrote:
 FWIW, I too prefer the latter, that is, fire only the parent's triggers.
 In that case, applying only the patch 0001 will do.
>>>
>>> Do we need to update the documentation?
>>
>> Yes, I think we should.  How about as in the attached?
>>
>> By the way, code changes I made in the attached are such that a subsequent
>> patch could implement firing statement-level triggers of all the tables in
>> a partition hierarchy, which it seems we don't want to do.  Should then
>> the code be changed to not create ResultRelInfos of all the tables but
>> only the root table (the one mentioned in the command)?  You will see that
>> the patch adds fields named es_nonleaf_result_relations and
>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>> would perhaps do, for example.
> 
> Did I notice correctly that there's no way to handle transition tables
> for partitions in AFTER STATEMENT triggers?

Did you mean to ask about AFTER STATEMENT triggers defined on
"partitioned" tables?  Specifying transition table for them is disallowed
at all.

ERROR:  "p" is a partitioned table
DETAIL:  Triggers on partitioned tables cannot have transition tables.

Triggers created on (leaf) partitions *do* allow specifying transition table.

Or are you asking something else altogether?

> If not, I'm not suggesting that this be added at this late date, but
> we might want to document that.

I don't see mentioned in the documentation that such triggers cannot be
defined on partitioned tables.  Is that what you are saying should be
documented?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote

Hi Rajkumar,

On 2017/04/28 17:11, Rajkumar Raghuwanshi wrote:
> On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote <
>> Updated patch attached.
> 
> I have applied given patch, could see below behaviour with statement
> trigger.
> 
> When trying to delete value within partition range, triggers got fired
> (with zero row affected) as expected, but if trying to delete value which
> is outside of partition range (with zero row affected), No trigger fired.
> is this expected??

Good catch.

I'm afraid though that this is not a defect of this patch, but some
unrelated (maybe) bug, which affects not only the partitioned tables but
inheritance in general.

Problem is that the plan created is such that the executor has no
opportunity to fire the trigger in question, because the plan contains no
information about which table is affected by the statement.  You can see
that with inheritance.  See below:

create table foo ();
create table bar () inherits (foo);

create or replace function trig_notice() returns trigger as $$
  begin raise notice 'trigger fired'; return null; end;
$$ language plpgsql;

create trigger foo_del_before before delete on foo
  for each statement execute procedure trig_notice();

explain delete from foo where false;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

-- no trigger fired
delete from foo where false;
DELETE 0

Trigger *is* fired though, if inheritance is not used.

explain delete from only foo where false;
   QUERY PLAN
-
 Delete on foo  (cost=0.00..0.00 rows=0 width=0)
   ->  Result  (cost=0.00..0.00 rows=0 width=6)
 One-Time Filter: false
(3 rows)

delete from only foo where false;
NOTICE:  trigger fired
DELETE 0

I'm not sure how to go about fixing this, if at all.  Or to fix it at
least for partitioned tables.  Would like to hear what others think.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Rajkumar Raghuwanshi

On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Updated patch attached.
>

Hi Amit,

I have applied given patch, could see below behaviour with statement
trigger.

When trying to delete value within partition range, triggers got fired
(with zero row affected) as expected, but if trying to delete value which
is outside of partition range (with zero row affected), No trigger fired.
is this expected??

Below are steps to reproduce.

CREATE TABLE trigger_test_table (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE trigger_test_table1 PARTITION OF trigger_test_table FOR VALUES
FROM (0) to (6);
INSERT INTO trigger_test_table (a,b) SELECT i,i FROM generate_series(1,3)i;

CREATE TABLE trigger_test_statatics(TG_NAME varchar,TG_TABLE_NAME
varchar,TG_LEVEL varchar,TG_WHEN varchar, TG_OP varchar);

CREATE FUNCTION trigger_test_procedure() RETURNS TRIGGER AS $ttp$
BEGIN
INSERT INTO trigger_test_statatics SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,TG_OP;
RETURN OLD;
END;
$ttp$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_test11 AFTER DELETE ON trigger_test_table FOR EACH
STATEMENT EXECUTE PROCEDURE trigger_test_procedure();
CREATE TRIGGER trigger_test12 AFTER DELETE ON trigger_test_table1 FOR EACH
STATEMENT EXECUTE PROCEDURE trigger_test_procedure();

postgres=# DELETE FROM trigger_test_table WHERE a = 5;
DELETE 0
postgres=# SELECT * FROM trigger_test_statatics;
tg_name |   tg_table_name| tg_level  | tg_when | tg_op
++---+-+
 trigger_test11 | trigger_test_table | STATEMENT | AFTER   | DELETE
(1 row)

TRUNCATE TABLE trigger_test_statatics;

--statement trigger NOT fired, when trying to delete data outside partition
range.
postgres=# DELETE FROM trigger_test_table WHERE a = 10;
DELETE 0
postgres=# SELECT * FROM trigger_test_statatics;
 tg_name | tg_table_name | tg_level | tg_when | tg_op
-+---+--+-+---
(0 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread Amit Langote

Thanks for taking a look.

On 2017/04/28 5:24, Robert Haas wrote:
> On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote
>  wrote:
>>> Do we need to update the documentation?
>>
>> Yes, I think we should.  How about as in the attached?
> 
> Looks reasonable, but I was thinking you might also update the section
> which contrasts inheritance-based partitioning with declarative
> partitioning.

It seems to me that there is no difference in behavior between
inheritance-based and declarative partitioning as far as statement-level
triggers are concerned (at least currently).  In both the cases, we fire
statement-level triggers only for the table specified in the command.

Maybe, we will fix things someday so that statement-level triggers will be
fired for all the tables in a inheritance hierarchy when the root parent
is updated or deleted, but that's currently not true.  We may never
implement that behavior for declarative partitioned tables though, so
there will be a difference if and when we implement the former.

Am I missing something?

>> By the way, code changes I made in the attached are such that a subsequent
>> patch could implement firing statement-level triggers of all the tables in
>> a partition hierarchy, which it seems we don't want to do.  Should then
>> the code be changed to not create ResultRelInfos of all the tables but
>> only the root table (the one mentioned in the command)?  You will see that
>> the patch adds fields named es_nonleaf_result_relations and
>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>> would perhaps do, for example.
> 
> It seems better not to create any ResultRelInfos that we don't
> actually need, so +1 for such a revision to the patch.

OK, done.  It took a bit more work than I thought.

Updated patch attached.

Thanks,
Amit
>From a8b584f09bc02b6c16c33e816fc12c7e443dd65c Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 +++
 src/backend/executor/execMain.c | 55 --
 src/backend/executor/nodeModifyTable.c  | 64 ++
 src/backend/nodes/copyfuncs.c   |  2 +
 src/backend/nodes/outfuncs.c|  3 ++
 src/backend/nodes/readfuncs.c   |  2 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/planner.c|  3 ++
 src/backend/optimizer/plan/setrefs.c| 19 ++--
 src/include/nodes/execnodes.h   | 12 +
 src/include/nodes/plannodes.h   | 13 +-
 src/include/nodes/relation.h|  1 +
 src/test/regress/expected/triggers.out  | 81 +
 src/test/regress/sql/triggers.sql   | 70 
 14 files changed, 323 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and
+foreign tables.  Row-level INSTEAD OF triggers may only be
+defined on views,

Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread David Fetter

On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
> On 2017/04/27 1:52, Robert Haas wrote:
> > On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
> >  wrote:
> >> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
> >> In that case, applying only the patch 0001 will do.
> > 
> > Do we need to update the documentation?
> 
> Yes, I think we should.  How about as in the attached?
> 
> By the way, code changes I made in the attached are such that a subsequent
> patch could implement firing statement-level triggers of all the tables in
> a partition hierarchy, which it seems we don't want to do.  Should then
> the code be changed to not create ResultRelInfos of all the tables but
> only the root table (the one mentioned in the command)?  You will see that
> the patch adds fields named es_nonleaf_result_relations and
> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> would perhaps do, for example.

Did I notice correctly that there's no way to handle transition tables
for partitions in AFTER STATEMENT triggers?

If not, I'm not suggesting that this be added at this late date, but
we might want to document that.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread Robert Haas

On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote
 wrote:
>> Do we need to update the documentation?
>
> Yes, I think we should.  How about as in the attached?

Looks reasonable, but I was thinking you might also update the section
which contrasts inheritance-based partitioning with declarative
partitioning.

> By the way, code changes I made in the attached are such that a subsequent
> patch could implement firing statement-level triggers of all the tables in
> a partition hierarchy, which it seems we don't want to do.  Should then
> the code be changed to not create ResultRelInfos of all the tables but
> only the root table (the one mentioned in the command)?  You will see that
> the patch adds fields named es_nonleaf_result_relations and
> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> would perhaps do, for example.

It seems better not to create any ResultRelInfos that we don't
actually need, so +1 for such a revision to the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-26 Thread Amit Langote

On 2017/04/27 1:52, Robert Haas wrote:
> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
>  wrote:
>> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
>> In that case, applying only the patch 0001 will do.
> 
> Do we need to update the documentation?

Yes, I think we should.  How about as in the attached?

By the way, code changes I made in the attached are such that a subsequent
patch could implement firing statement-level triggers of all the tables in
a partition hierarchy, which it seems we don't want to do.  Should then
the code be changed to not create ResultRelInfos of all the tables but
only the root table (the one mentioned in the command)?  You will see that
the patch adds fields named es_nonleaf_result_relations and
es_num_nonleaf_result_relations, whereas just es_root_result_relation
would perhaps do, for example.

Thanks,
Amit
>From c8b785a77a2a193906967762066e03e09de80442 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH 1/2] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 -
 src/backend/executor/execMain.c | 33 -
 src/backend/executor/nodeModifyTable.c  | 66 -
 src/backend/nodes/copyfuncs.c   |  1 +
 src/backend/nodes/outfuncs.c|  1 +
 src/backend/nodes/readfuncs.c   |  1 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/setrefs.c|  4 ++
 src/include/nodes/execnodes.h   | 11 ++
 src/include/nodes/plannodes.h   |  2 +
 src/test/regress/expected/triggers.out  | 54 +++
 src/test/regress/sql/triggers.sql   | 48 
 12 files changed, 227 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and
+foreign tables.  Row-level INSTEAD OF triggers may only be
+defined on views, and fire immediately as each row in the view is
+identified as needing to be operated on.
+   
+
+   
+A statement-level trigger defined on partitioned tables is fired only
+once for the table itself, not once for every table in the partitioning
+hierarchy.  However, row-level triggers of any affected leaf partitions
+will be fired.

 

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5c12fb457d..586b396b3e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -861,18 +861,35 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 		/*
 		 * In the partitioned result relation case, lock the non-leaf result
-		 * relations too.  We don't however need ResultRelInfos for them.
+		 * relations too.  We also need ResultRelInfos so that per-statement
+		 * triggers defined on these relations can be fired.
 		 */
 		if (plannedstmt->nonleafResultRelations)
 		{
+			numResultRelations =
+			list_length(plannedstmt->nonleafResultRelations);
+
+			res

Re: [HACKERS] Declarative partitioning - another take

2017-04-26 Thread Robert Haas

On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
 wrote:
> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
> In that case, applying only the patch 0001 will do.

Do we need to update the documentation?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-26 Thread Amit Khandekar

On 26 April 2017 at 00:28, Robert Haas  wrote:
> So what I'd prefer -- on
> the totally unprincipled basis that it would let us improve
> performance in the future -- if you operate on a partition directly,
> you fire the partition's triggers, but if you operate on the parent,
> only the parent's triggers fire.

I would also opt for this behaviour.

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Amit Langote

Hi,

Thanks for testing.

On 2017/04/25 19:03, Rajkumar Raghuwanshi wrote:
> Thanks for looking into it. I have applied fixes and checked for triggers.
> I could see difference in behaviour of statement triggers for INSERT and
> UPDATE, for insert only root partition triggers are getting fired but for
> update root as well as child partition table triggers both getting fired.
> is this expected??

Yes, because I didn't implement anything for the insert case yet.  I posed
a question whether to fire partitions' per-statement triggers when
inserting data through the root table.

Robert replied [1] that it would be desirable to not fire partitions'
per-statement triggers if the root table is mentioned in the query; only
fire their per-row triggers if any.  It already works that way for
inserts, and applying only 0001 will get you the same for update/delete.
Patch 0002 is to enable firing partition's per-statement triggers even if
the root table is mentioned in the query, but it implemented the same only
for the update/delete cases.  If we decide that that's the right thing to
do, then I will implement the same behavior for the insert case too.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoatBYy8Hyi3cYR1rFrCkD2NM4ZLZcck4QDGvH%3DHddfDwA%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Amit Langote

On 2017/04/26 3:58, Robert Haas wrote:
> On Mon, Apr 24, 2017 at 6:43 AM, Amit Langote
>  wrote:
>> The reason it doesn't work is that we do not allocate ResultRelInfos for
>> partitioned tables (not even for the root partitioned table in the
>> update/delete cases), because the current implementation assumes they are
>> not required.  That's fine only so long as we consider that no rows are
>> inserted into them, no indexes need to be updated, and that no row-level
>> triggers are to be fired.  But it misses the fact that we do allow
>> statement-level triggers on partitioned tables.  So, we should fix things
>> such that ResultRelInfos are allocated so that any statement-level
>> triggers are fired. But there are following questions to consider:
>>
>> 1. Should we consider only the root partitioned table or all partitioned
>>tables in a given hierarchy, including the child partitioned tables?
>>Note that this is related to a current limitation of updating/deleting
>>inheritance hierarchies that we do not currently fire per-statement
>>triggers of the child tables.  See the TODO item in wiki:
>>https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When
>>statement-level triggers are defined on a parent table, have them fire
>>only on the parent table, and fire child table triggers only where
>>appropriate"
>>
>> 2. Do we apply the same to inserts on the partitioned tables?  Since
>>insert on a partitioned table implicitly affects its partitions, one
>>might say that we would need to fire per-statement insert triggers of
>>all the partitions.
> 
> It seems to me that it doesn't make a lot of sense to fire the
> triggers for some tables involved in the hierarchy and not others.  I
> suppose the consistent thing to do here is to make sure that we fire
> the statement triggers for all tables in the partitioning hierarchy
> for all operations (insert, update, delete, etc.).
> 
> TBH, I don't like that very much.  I'd rather fire the triggers only
> for the table actually named in the query and skip all the others,
> mostly because it seems like it would be faster and less likely to
> block future optimizations; eventually, I'd like to consider not even
> locking the children we're not touching, but that's not going to work
> if we have to check them all for triggers.  So what I'd prefer -- on
> the totally unprincipled basis that it would let us improve
> performance in the future -- if you operate on a partition directly,
> you fire the partition's triggers, but if you operate on the parent,
> only the parent's triggers fire.

FWIW, I too prefer the latter, that is, fire only the parent's triggers.
In that case, applying only the patch 0001 will do.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Robert Haas

On Mon, Apr 24, 2017 at 6:43 AM, Amit Langote
 wrote:
> The reason it doesn't work is that we do not allocate ResultRelInfos for
> partitioned tables (not even for the root partitioned table in the
> update/delete cases), because the current implementation assumes they are
> not required.  That's fine only so long as we consider that no rows are
> inserted into them, no indexes need to be updated, and that no row-level
> triggers are to be fired.  But it misses the fact that we do allow
> statement-level triggers on partitioned tables.  So, we should fix things
> such that ResultRelInfos are allocated so that any statement-level
> triggers are fired. But there are following questions to consider:
>
> 1. Should we consider only the root partitioned table or all partitioned
>tables in a given hierarchy, including the child partitioned tables?
>Note that this is related to a current limitation of updating/deleting
>inheritance hierarchies that we do not currently fire per-statement
>triggers of the child tables.  See the TODO item in wiki:
>https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When
>statement-level triggers are defined on a parent table, have them fire
>only on the parent table, and fire child table triggers only where
>appropriate"
>
> 2. Do we apply the same to inserts on the partitioned tables?  Since
>insert on a partitioned table implicitly affects its partitions, one
>might say that we would need to fire per-statement insert triggers of
>all the partitions.

It seems to me that it doesn't make a lot of sense to fire the
triggers for some tables involved in the hierarchy and not others.  I
suppose the consistent thing to do here is to make sure that we fire
the statement triggers for all tables in the partitioning hierarchy
for all operations (insert, update, delete, etc.).

TBH, I don't like that very much.  I'd rather fire the triggers only
for the table actually named in the query and skip all the others,
mostly because it seems like it would be faster and less likely to
block future optimizations; eventually, I'd like to consider not even
locking the children we're not touching, but that's not going to work
if we have to check them all for triggers.  So what I'd prefer -- on
the totally unprincipled basis that it would let us improve
performance in the future -- if you operate on a partition directly,
you fire the partition's triggers, but if you operate on the parent,
only the parent's triggers fire.

How heretical is that idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-25 Thread Rajkumar Raghuwanshi

On Mon, Apr 24, 2017 at 4:13 PM, Amit Langote  wrote:

> Hi Rajkumar,
>
> It would be great if you could check if the patches fix the issues.
>

Hi Amit,

Thanks for looking into it. I have applied fixes and checked for triggers.
I could see difference in behaviour of statement triggers for INSERT and
UPDATE, for insert only root partition triggers are getting fired but for
update root as well as child partition table triggers both getting fired.
is this expected??

Below are steps to reproduce.

CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (6);
CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (6) to (11);
INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,7)i;

CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
varchar,TG_WHEN varchar,a_old int,a_new int,b_old int,b_new int);
CREATE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $ttp$
 BEGIN
 IF (TG_OP = 'INSERT') THEN
  IF (TG_LEVEL = 'STATEMENT') THEN INSERT INTO pt_trigger
SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NULL,NULL,NULL; END IF;
  IF (TG_LEVEL = 'ROW') THEN INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NEW.a,NULL,NEW.b; END IF;
  RETURN NEW;
  END IF;
 IF (TG_OP = 'UPDATE') THEN
IF (TG_LEVEL = 'STATEMENT') THEN INSERT INTO pt_trigger
SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,NULL,NULL,NULL,NULL; END IF;
IF (TG_LEVEL = 'ROW') THEN INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,OLD.a,NEW.a,OLD.b,NEW.b; END IF;
 RETURN NEW;
 END IF;
 END;
$ttp$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_test11 AFTER INSERT OR UPDATE ON pt FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test12 AFTER INSERT OR UPDATE ON pt1 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test13 AFTER INSERT OR UPDATE ON pt2 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();

CREATE TRIGGER trigger_test21 BEFORE INSERT OR UPDATE ON pt FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test22 BEFORE INSERT OR UPDATE ON pt1 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test23 BEFORE INSERT OR UPDATE ON pt2 FOR EACH
STATEMENT EXECUTE PROCEDURE process_pt_trigger();

CREATE TRIGGER trigger_test32 AFTER INSERT OR UPDATE ON pt1 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test33 AFTER INSERT OR UPDATE ON pt2 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();

CREATE TRIGGER trigger_test42 BEFORE INSERT OR UPDATE ON pt1 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER trigger_test43 BEFORE INSERT OR UPDATE ON pt2 FOR EACH ROW
EXECUTE PROCEDURE process_pt_trigger();

postgres=# INSERT INTO pt (a,b) VALUES (8,8);
INSERT 0 1
postgres=# SELECT * FROM pt_trigger;
tg_name | tg_table_name | tg_level  | tg_when | a_old | a_new |
b_old | b_new
+---+---+-+---+---+---+---
 trigger_test21 | pt| STATEMENT | BEFORE  |   |
|   |
 trigger_test43 | pt2   | ROW   | BEFORE  |   | 8
|   | 8
 trigger_test33 | pt2   | ROW   | AFTER   |   | 8
|   | 8
 trigger_test11 | pt| STATEMENT | AFTER   |   |
|   |
(4 rows)

postgres=# TRUNCATE TABLE pt_trigger;
TRUNCATE TABLE
postgres=# UPDATE pt SET a = 2 WHERE a = 1;
UPDATE 1
postgres=# SELECT * FROM pt_trigger;
tg_name | tg_table_name | tg_level  | tg_when | a_old | a_new |
b_old | b_new
+---+---+-+---+---+---+---
 trigger_test21 | pt| STATEMENT | BEFORE  |   |
|   |
 trigger_test22 | pt1   | STATEMENT | BEFORE  |   |
|   |
 trigger_test42 | pt1   | ROW   | BEFORE  | 1 | 2 |
1 | 1
 trigger_test32 | pt1   | ROW   | AFTER   | 1 | 2 |
1 | 1
 trigger_test11 | pt| STATEMENT | AFTER   |   |
|   |
 trigger_test12 | pt1   | STATEMENT | AFTER   |   |
|   |
(6 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB

Re: [HACKERS] Declarative partitioning - another take

2017-04-24 Thread Amit Langote

Hi Rajkumar,

Thanks for testing and the report.

On 2017/04/21 17:00, Rajkumar Raghuwanshi wrote:
> Hi,
> 
> I have observed below with the statement triggers.
> 
> I am able to create statement triggers at root partition, but these
> triggers, not getting fired on updating partition.
> 
> CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
> CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7);
> CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11);
> INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i;
> CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
> varchar,TG_WHEN varchar);
> CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$
> BEGIN
> IF (TG_OP = 'UPDATE') THEN
> INSERT INTO pt_trigger SELECT
> TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN;
> RETURN NEW;
> END IF;
> RETURN NULL;
> END;
> $pttg$ LANGUAGE plpgsql;
> CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> 
> postgres=# UPDATE pt SET a = 2 WHERE a = 1;
> UPDATE 1
> postgres=# SELECT * FROM pt_trigger ORDER BY 1;
>  tg_name | tg_table_name | tg_level | tg_when
> -+---+--+-
> (0 rows)
> 
> no statement level trigger fired in this case, is this expected behaviour??

I think we discussed this during the development and decided to allow
per-statement triggers on partitioned tables [1], but it seems it doesn't
quite work for update/delete as your example shows.  You can however see
that per-statement triggers of the root partitioned table *are* fired in
the case of insert.

The reason it doesn't work is that we do not allocate ResultRelInfos for
partitioned tables (not even for the root partitioned table in the
update/delete cases), because the current implementation assumes they are
not required.  That's fine only so long as we consider that no rows are
inserted into them, no indexes need to be updated, and that no row-level
triggers are to be fired.  But it misses the fact that we do allow
statement-level triggers on partitioned tables.  So, we should fix things
such that ResultRelInfos are allocated so that any statement-level
triggers are fired.  But there are following questions to consider:

1. Should we consider only the root partitioned table or all partitioned
   tables in a given hierarchy, including the child partitioned tables?
   Note that this is related to a current limitation of updating/deleting
   inheritance hierarchies that we do not currently fire per-statement
   triggers of the child tables.  See the TODO item in wiki:
   https://wiki.postgresql.org/wiki/Todo#Triggers, which says: "When
   statement-level triggers are defined on a parent table, have them fire
   only on the parent table, and fire child table triggers only where
   appropriate"

2. Do we apply the same to inserts on the partitioned tables?  Since
   insert on a partitioned table implicitly affects its partitions, one
   might say that we would need to fire per-statement insert triggers of
   all the partitions.

Meanwhile, attached is a set of patches to fix this.  Descriptions follow:

0001: fire per-statement triggers of the partitioned table mentioned in a
  given update or delete statement

0002: fire per-statement triggers of the child tables during update/delete
  of inheritance hierarchies (including the partitioned table case)

Depending on the answer to 2 above, we can arrange for the per-statement
triggers of all the partitions to be fired upon insert into the
partitioned table.

> but if i am creating triggers on leaf partition, trigger is getting fired.
> 
> CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT
> EXECUTE PROCEDURE process_pt_trigger();
> 
> postgres=# UPDATE pt SET a = 5 WHERE a = 4;
> UPDATE 1
> postgres=# SELECT * FROM pt_trigger ORDER BY 1;
>tg_name| tg_table_name | tg_level  | tg_when
> --+---+---+-
>  pt_trigger_after_p1  | pt1   | STATEMENT | AFTER
>  pt_trigger_before_p1 | pt1   | STATEMENT | BEFORE
> (2 rows)

Actually, this works only by accident (with the current implementation,
the *first* partition's triggers will get fired).  If you create another
partition, its per-statement triggers won't get fired.  The patches
described above fixes that.

It would be great if you could check if the patches fix the issues.

Also, adding this to the PostgreSQL 10 open items list.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/8e05817e-14c8-13e4-502b-e440adb24258%40lab.ntt.co.jp
>From 9c7543615ccb05c004591a9176f818413df7ea9d Mon Sep 17 00:00:00 2001
From: amit 
Da

Re: [HACKERS] Declarative partitioning - another take

2017-04-21 Thread Rajkumar Raghuwanshi

Hi,

I have observed below with the statement triggers.

I am able to create statement triggers at root partition, but these
triggers, not getting fired on updating partition.

CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7);
CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11);
INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i;
CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
varchar,TG_WHEN varchar);
CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$
BEGIN
IF (TG_OP = 'UPDATE') THEN
INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN;
RETURN NEW;
END IF;
RETURN NULL;
END;
$pttg$ LANGUAGE plpgsql;
CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();

postgres=# UPDATE pt SET a = 2 WHERE a = 1;
UPDATE 1
postgres=# SELECT * FROM pt_trigger ORDER BY 1;
 tg_name | tg_table_name | tg_level | tg_when
-+---+--+-
(0 rows)

no statement level trigger fired in this case, is this expected behaviour??

but if i am creating triggers on leaf partition, trigger is getting fired.

CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();

postgres=# UPDATE pt SET a = 5 WHERE a = 4;
UPDATE 1
postgres=# SELECT * FROM pt_trigger ORDER BY 1;
   tg_name| tg_table_name | tg_level  | tg_when
--+---+---+-
 pt_trigger_after_p1  | pt1   | STATEMENT | AFTER
 pt_trigger_before_p1 | pt1   | STATEMENT | BEFORE
(2 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-04-13 Thread Amit Langote

On 2017/04/14 5:28, Robert Haas wrote:
> On Thu, Apr 6, 2017 at 3:14 AM, Amit Langote
>  wrote:
>>> The bulk of operations that work on traditional tables also work on 
>>> partitions
>>> and partitioned tables.  The next closest kind of relation, a materialized
>>> view, is far less table-like.  Therefore, I recommend showing both 
>>> partitions
>>> and partitioned tables in those views.  This is also consistent with the
>>> decision to use words like "partition" and "partitioned" in messages only 
>>> when
>>> partitioning is relevant to the error.  For example, ATWrongRelkindError()
>>> distinguishes materialized views from tables, but it does not distinguish
>>> tables based on their participation in partitioning.
>>
>> +1
> 
> OK, whoever wants to write the patch, please step forward.

Sorry, perhaps I'm missing something, but I thought there was no patch
left to be written, because the original patch (this thread) implemented
what Noah recommended.

As of HEAD (6cfaffc0ddc):

create table p (a int, b char) partition by list (a);
create table p1 partition of p for values in (1) partition by list (b);
create table p1a partition of p1 for values in ('a');

\d
   List of relations
 Schema | Name | Type  | Owner
+--+---+---
 public | p| table | amit
 public | p1   | table | amit
 public | p1a  | table | amit
(3 rows)

select tablename from pg_tables where schemaname = 'public';
 tablename
---
 p
 p1
 p1a
(3 rows)

select table_name from information_schema.tables where table_schema =
'public';
 table_name

 p
 p1
 p1a
(3 rows)

Also, it seems that this open item has been listed under Non-bugs, with
remark "firm support for status quo, lack of firm support for alternatives".

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-04-13 Thread Robert Haas

On Thu, Apr 6, 2017 at 3:14 AM, Amit Langote
 wrote:
>> The bulk of operations that work on traditional tables also work on 
>> partitions
>> and partitioned tables.  The next closest kind of relation, a materialized
>> view, is far less table-like.  Therefore, I recommend showing both partitions
>> and partitioned tables in those views.  This is also consistent with the
>> decision to use words like "partition" and "partitioned" in messages only 
>> when
>> partitioning is relevant to the error.  For example, ATWrongRelkindError()
>> distinguishes materialized views from tables, but it does not distinguish
>> tables based on their participation in partitioning.
>
> +1

OK, whoever wants to write the patch, please step forward.

/me steps backward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-07 Thread Maksim Milyutin


On 07.04.2017 13:05, Etsuro Fujita wrote:

On 2016/12/14 16:20, Etsuro Fujita wrote:

On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.



That would be great!  I'd like to help review the first one.


There seems to be no work on the first one, so I'd like to work on that.


Yes, you can start to work on this, I'll join later as a reviewer.



--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-07 Thread Etsuro Fujita


On 2016/12/14 16:20, Etsuro Fujita wrote:

On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.



That would be great!  I'd like to help review the first one.


There seems to be no work on the first one, so I'd like to work on that.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-04-06 Thread Amit Langote

On 2017/04/06 16:02, Noah Misch wrote:
> On Wed, Jan 25, 2017 at 01:19:00PM -0500, Robert Haas wrote:
>> On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut
>>  wrote:
>>> On 1/18/17 2:32 PM, Robert Haas wrote:
 Unless we can find something official, I suppose we should just
 display BASE TABLE in that case as we do in other cases.  I wonder if
 the schema needs some broader revision; for example, are there
 information_schema elements intended to show information about
 partitions?
>>>
>>> Is it intentional that we show the partitions by default in \d,
>>> pg_tables, information_schema.tables?  Or should we treat those as
>>> somewhat-hidden details?
>>
>> I'm not really sure what the right thing to do is there.  I was hoping
>> you had an opinion.
> 
> The bulk of operations that work on traditional tables also work on partitions
> and partitioned tables.  The next closest kind of relation, a materialized
> view, is far less table-like.  Therefore, I recommend showing both partitions
> and partitioned tables in those views.  This is also consistent with the
> decision to use words like "partition" and "partitioned" in messages only when
> partitioning is relevant to the error.  For example, ATWrongRelkindError()
> distinguishes materialized views from tables, but it does not distinguish
> tables based on their participation in partitioning.

+1

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-04-06 Thread Noah Misch

On Wed, Jan 25, 2017 at 01:19:00PM -0500, Robert Haas wrote:
> On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut
>  wrote:
> > On 1/18/17 2:32 PM, Robert Haas wrote:
> >> Unless we can find something official, I suppose we should just
> >> display BASE TABLE in that case as we do in other cases.  I wonder if
> >> the schema needs some broader revision; for example, are there
> >> information_schema elements intended to show information about
> >> partitions?
> >
> > Is it intentional that we show the partitions by default in \d,
> > pg_tables, information_schema.tables?  Or should we treat those as
> > somewhat-hidden details?
> 
> I'm not really sure what the right thing to do is there.  I was hoping
> you had an opinion.

The bulk of operations that work on traditional tables also work on partitions
and partitioned tables.  The next closest kind of relation, a materialized
view, is far less table-like.  Therefore, I recommend showing both partitions
and partitioned tables in those views.  This is also consistent with the
decision to use words like "partition" and "partitioned" in messages only when
partitioning is relevant to the error.  For example, ATWrongRelkindError()
distinguishes materialized views from tables, but it does not distinguish
tables based on their participation in partitioning.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-03-30 Thread Amit Langote

On 2017/01/26 3:19, Robert Haas wrote:
> On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut
>  wrote:
>> On 1/18/17 2:32 PM, Robert Haas wrote:
>>> Unless we can find something official, I suppose we should just
>>> display BASE TABLE in that case as we do in other cases.  I wonder if
>>> the schema needs some broader revision; for example, are there
>>> information_schema elements intended to show information about
>>> partitions?
>>
>> Is it intentional that we show the partitions by default in \d,
>> pg_tables, information_schema.tables?  Or should we treat those as
>> somewhat-hidden details?
> 
> I'm not really sure what the right thing to do is there.  I was hoping
> you had an opinion.

I guess this is an open item then.  I think Greg Stark brought this up too
on the original partitioning thread [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAM-w4HOZ5fPS7GoCTTrW42q01%2BwPrOWFCnr9H0iDyVTZP2H1CA%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Amit Langote

On Fri, Mar 24, 2017 at 11:06 PM, Simon Riggs  wrote:
> On 1 March 2017 at 01:36, Amit Langote  wrote:
>
>> I don't know which way you're thinking of fixing this, but a planner patch
>> to implement faster partition-pruning will have taken care of this, I
>> think.  As you may know, even declarative partitioned tables currently
>> depend on constraint exclusion for partition-pruning and planner's current
>> approach of handling inheritance requires to open all the child tables
>> (partitions), whereas the new approach hopefully shouldn't need to do
>> that.  I am not sure if looking for a more localized fix for this would be
>> worthwhile, although I may be wrong.
>
> What "new approach" are we discussing?
>
> Is there a patch or design discussion?

Neither at the moment.  As Aleksander said in his reply I was
referring to Dmitry Ivanov's plan of porting pg_pathman's planner
functionality to core that he mentioned on the declarative
partitioning thread back in December [1].

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/426b2b01-61e0-43aa-bd84-c6fcf516f1c3%40postgrespro.ru


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev

Hi Simon,

> > I don't know which way you're thinking of fixing this, but a planner patch
> > to implement faster partition-pruning will have taken care of this, I
> > think.  As you may know, even declarative partitioned tables currently
> > depend on constraint exclusion for partition-pruning and planner's current
> > approach of handling inheritance requires to open all the child tables
> > (partitions), whereas the new approach hopefully shouldn't need to do
> > that.  I am not sure if looking for a more localized fix for this would be
> > worthwhile, although I may be wrong.
> 
> What "new approach" are we discussing?
> Is there a patch or design discussion?

I think what was meant was plans of my colleague Dmitry Ivanov to
implement partition-pruning. I've just spoke with Dmitry about this
matter. Unless there is anyone else who is already working on this
optimization we would like to work on it together. Unfortunately there
is no patch or design discussion of partition-pruning on this
commitfest.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Simon Riggs

On 1 March 2017 at 01:36, Amit Langote  wrote:

> I don't know which way you're thinking of fixing this, but a planner patch
> to implement faster partition-pruning will have taken care of this, I
> think.  As you may know, even declarative partitioned tables currently
> depend on constraint exclusion for partition-pruning and planner's current
> approach of handling inheritance requires to open all the child tables
> (partitions), whereas the new approach hopefully shouldn't need to do
> that.  I am not sure if looking for a more localized fix for this would be
> worthwhile, although I may be wrong.

What "new approach" are we discussing?

Is there a patch or design discussion?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-10 Thread Aleksander Alekseev

Hi Tels,

Thanks a lot for the review!

> "corresponding"

Fixed.

> Also a question: Some one-line comments are
> 
>  /* Comment. */
> 
> while others are
> 
>  /*
>   * Comment.
>   */
> 
> Why the difference?

I'm trying to follow a code stile used in a code I'm modifying. In this
case I got an impression that second style of one-line comments is used
more often an I tried to follow it but apparently I didn't succeed :)
This is fixed as well.

On Thu, Mar 09, 2017 at 06:40:39PM -0500, Tels wrote:
> Hi Aleksander,
> 
> noticed this in your patch:
> 
> > * Add a corespinding entry to pgStatTabHash.
> 
> "corresponding"
> 
> Also a question: Some one-line comments are
> 
>  /* Comment. */
> 
> while others are
> 
>  /*
>   * Comment.
>   */
> 
> Why the difference?
> 
> Hope this helps,
> 
> Tels

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7cacb1e9b2..a22a5a37c8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -161,6 +161,20 @@ typedef struct TabStatusArray
 static TabStatusArray *pgStatTabList = NULL;
 
 /*
+ * pgStatTabHash entry
+ */
+typedef struct TabStatHashEntry
+{
+	Oid t_id;
+	PgStat_TableStatus* tsa_entry;
+} TabStatHashEntry;
+
+/*
+ * Hash table for O(1) t_id -> tsa_entry lookup
+ */
+static HTAB *pgStatTabHash = NULL;
+
+/*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
  */
@@ -815,7 +829,13 @@ pgstat_report_stat(bool force)
 pgstat_send_tabstat(this_msg);
 this_msg->m_nentries = 0;
 			}
+
+			/*
+			 * Entry will be zeroed soon so we need to remove it from the lookup table.
+			 */
+			hash_search(pgStatTabHash, &entry->t_id, HASH_REMOVE, NULL);
 		}
+
 		/* zero out TableStatus structs after use */
 		MemSet(tsa->tsa_entries, 0,
 			   tsa->tsa_used * sizeof(PgStat_TableStatus));
@@ -1667,59 +1687,77 @@ pgstat_initstats(Relation rel)
 }
 
 /*
+ * Make sure pgStatTabList and pgStatTabHash are initialized.
+ */
+static void
+make_sure_stat_tab_initialized()
+{
+	HASHCTL ctl;
+
+	if (pgStatTabList)
+		return;
+
+	Assert(!pgStatTabHash);
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(TabStatHashEntry);
+	ctl.hcxt = TopMemoryContext;
+
+	pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup table",
+		TABSTAT_QUANTUM, &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	pgStatTabList = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+sizeof(TabStatusArray));
+}
+
+/*
  * get_tabstat_entry - find or create a PgStat_TableStatus entry for rel
  */
 static PgStat_TableStatus *
 get_tabstat_entry(Oid rel_id, bool isshared)
 {
+	TabStatHashEntry* hash_entry;
 	PgStat_TableStatus *entry;
 	TabStatusArray *tsa;
-	TabStatusArray *prev_tsa;
-	int			i;
+	bool found;
+
+	make_sure_stat_tab_initialized();
+
+	/*
+	 * Find an entry or create a new one.
+	 */
+	hash_entry = hash_search(pgStatTabHash, &rel_id, HASH_ENTER, &found);
+	if(found)
+		return hash_entry->tsa_entry;
 
 	/*
-	 * Search the already-used tabstat slots for this relation.
+	 * `hash_entry` was just created and now we have to fill it.
+	 * First make sure there is a free space in a last element of pgStatTabList.
 	 */
-	prev_tsa = NULL;
-	for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next)
+	tsa = pgStatTabList;
+	while(tsa->tsa_used == TABSTAT_QUANTUM)
 	{
-		for (i = 0; i < tsa->tsa_used; i++)
+		if(tsa->tsa_next == NULL)
 		{
-			entry = &tsa->tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
+			tsa->tsa_next = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+		sizeof(TabStatusArray));
 		}
 
-		if (tsa->tsa_used < TABSTAT_QUANTUM)
-		{
-			/*
-			 * It must not be present, but we found a free slot instead. Fine,
-			 * let's use this one.  We assume the entry was already zeroed,
-			 * either at creation or after last use.
-			 */
-			entry = &tsa->tsa_entries[tsa->tsa_used++];
-			entry->t_id = rel_id;
-			entry->t_shared = isshared;
-			return entry;
-		}
+		tsa = tsa->tsa_next;
 	}
 
 	/*
-	 * We ran out of tabstat slots, so allocate more.  Be sure they're zeroed.
-	 */
-	tsa = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
-	sizeof(TabStatusArray));
-	if (prev_tsa)
-		prev_tsa->tsa_next = tsa;
-	else
-		pgStatTabList = tsa;
-
-	/*
-	 * Use the first entry of the new TabStatusArray.
+	 * Add an entry.
 	 */
 	entry = &tsa->tsa_entries[tsa->tsa_used++];
 	entry->t_id = rel_id;
 	entry->t_shared = isshared;
+
+	/*
+	 * Add a corresponding entry to pgStatTabHash.
+	 */
+	hash_entry->tsa_entry = entry;
 	return entry;
 }
 
@@ -1731,22 +1769,19 @@ get_tabstat_entry(Oid rel_id, bool isshared)
 PgStat_TableStatus *
 find_tabstat_entry(Oid rel_id)
 {
-	PgStat_TableStatus *entry;
-	TabStatusArray *tsa;
-	int			i;
+	TabStatHashEntry* hash_entry;
 
-	for (tsa = pgStatTabList; tsa

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Tels

Hi Aleksander,

noticed this in your patch:

> * Add a corespinding entry to pgStatTabHash.

"corresponding"

Also a question: Some one-line comments are

 /* Comment. */

while others are

 /*
  * Comment.
  */

Why the difference?

Hope this helps,

Tels

PS: Sorry if this appears twice, I fumbled the From: ...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Tels

Hi Aleksander,

noticed this in your patch:

> * Add a corespinding entry to pgStatTabHash.

"corresponding"

Also a question: Some one-line comments are

 /* Comment. */

while others are

 /*
  * Comment.
  */

Why the difference?

Hope this helps,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Aleksander Alekseev

Hi Amit,

> Sorry, I didn't mean to dissuade you from trying those
> micro-optimizations.  If general inheritance cases can benefit from it
> (which, until we have a different method, will be used by partitioned
> tables as well), I think we should try it.

OK, I'll see what could be done here as well then.

On Tue, Mar 07, 2017 at 10:55:12AM +0900, Amit Langote wrote:
> Hi Aleksander,
> 
> On 2017/03/07 0:22, Aleksander Alekseev wrote:
> > Hello.
> > 
> > OK, here is a patch.
> > 
> > Benchmark, before:
> > 
> > ```
> > number of transactions actually processed: 1823
> > latency average = 1153.495 ms
> > latency stddev = 154.366 ms
> > tps = 6.061104 (including connections establishing)
> > tps = 6.061211 (excluding connections establishing)
> > ```
> > 
> > Benchmark, after:
> > 
> > ```
> > number of transactions actually processed: 2462
> > latency average = 853.862 ms
> > latency stddev = 112.270 ms
> > tps = 8.191861 (including connections establishing)
> > tps = 8.192028 (excluding connections establishing)
> > ```
> > 
> > +35% TPS, just as expected. Feel free to run your own benchmarks on
> > different datasets and workloads. `perf top` shows that first bottleneck
> > is completely eliminated.
> 
> That seems like a good gain.
> 
> > I did nothing about the second bottleneck
> > since as Amit mentioned partition-pruning should solve this anyway and
> > apparently any micro-optimizations don't worth an effort.
> 
> Sorry, I didn't mean to dissuade you from trying those
> micro-optimizations.  If general inheritance cases can benefit from it
> (which, until we have a different method, will be used by partitioned
> tables as well), I think we should try it.
> 
> Thanks,
> Amit
> 
> 

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Aleksander Alekseev

Hi, Andres

Thanks a lot for the review!

> Why are we keeping that list / the "batch" allocation pattern? It
> doesn't actually seem useful to me after these changes.  Given that we
> don't shrink hash-tables, we could just use the hash-table memory for
> this, no?

I don't think we can do that. According to comments:

```
 * NOTE: once allocated, TabStatusArray structures are never moved or deleted
 [...]
 * This allows relcache pgstat_info pointers to be treated as long-lived data,
 * avoiding repeated searches in pgstat_initstats() when a relation is
 * repeatedly opened during a transaction.
```

It is my understanding that dynahash can't give same guarantees.

> 'faster ... lookup' doesn't strike me as a useful comment, because it's
> only useful in relation of the current code to the new code.

Agree, fixed to 'O(1) ... lookup'.

> It's not really freed...

Right, it's actually zeroed. Fixed.

> Arguably it'd be better to HASH_ENTER directly here, instead of doing
> two lookups.

Good point. Fixed.

> Think it'd be better to move the hash creation into its own function.

Agree, fixed.

On Mon, Mar 06, 2017 at 12:01:50PM -0800, Andres Freund wrote:
> Hi,
> 
> This issue has bothered me in non-partitioned use-cases recently, so
> thanks for taking it up.
> 
> 
> On 2017-03-06 18:22:17 +0300, Aleksander Alekseev wrote:
> > diff --git a/src/backend/postmaster/pgstat.c 
> > b/src/backend/postmaster/pgstat.c
> > index 2fb9a8bf58..fa906e7930 100644
> > --- a/src/backend/postmaster/pgstat.c
> > +++ b/src/backend/postmaster/pgstat.c
> > @@ -160,6 +160,16 @@ typedef struct TabStatusArray
> >  
> >  static TabStatusArray *pgStatTabList = NULL;
> 
> Why are we keeping that list / the "batch" allocation pattern? It
> doesn't actually seem useful to me after these changes.  Given that we
> don't shrink hash-tables, we could just use the hash-table memory for
> this, no?
> 
> I think a separate list that only contains things that are "active" in
> the current transaction makes sense, because the current logic requires
> iterating over a potentially very large array at transaction commit.
> 
> 
> > +/* pgStatTabHash entry */
> > +typedef struct TabStatHashEntry
> > +{
> > +   Oid t_id;
> > +   PgStat_TableStatus* tsa_entry;
> > +} TabStatHashEntry;
> > +
> > +/* Hash table for faster t_id -> tsa_entry lookup */
> > +static HTAB *pgStatTabHash = NULL;
> > +
> 
> 'faster ... lookup' doesn't strike me as a useful comment, because it's
> only useful in relation of the current code to the new code.
> 
> 
> 
> >  /*
> >   * Backends store per-function info that's waiting to be sent to the 
> > collector
> >   * in this hash table (indexed by function OID).
> > @@ -815,7 +825,13 @@ pgstat_report_stat(bool force)
> > pgstat_send_tabstat(this_msg);
> > this_msg->m_nentries = 0;
> > }
> > +
> > +   /*
> > +* Entry will be freed soon so we need to remove it 
> > from the lookup table.
> > +*/
> > +   hash_search(pgStatTabHash, &entry->t_id, HASH_REMOVE, 
> > NULL);
> > }
> 
> It's not really freed...
> 
> 
> >  static PgStat_TableStatus *
> >  get_tabstat_entry(Oid rel_id, bool isshared)
> >  {
> > +   TabStatHashEntry* hash_entry;
> > PgStat_TableStatus *entry;
> > TabStatusArray *tsa;
> > -   TabStatusArray *prev_tsa;
> > -   int i;
> > +
> > +   /* Try to find an entry */
> > +   entry = find_tabstat_entry(rel_id);
> > +   if(entry != NULL)
> > +   return entry;
> 
> Arguably it'd be better to HASH_ENTER directly here, instead of doing
> two lookups.
> 
> 
> > /*
> > -* Search the already-used tabstat slots for this relation.
> > +* Entry doesn't exist - creating one.
> > +* First make sure there is a free space in a last element of 
> > pgStatTabList.
> >  */
> > -   prev_tsa = NULL;
> > -   for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = 
> > tsa->tsa_next)
> > +   if (!pgStatTabList)
> > {
> > -   for (i = 0; i < tsa->tsa_used; i++)
> > -   {
> > -   entry = &tsa->tsa_entries[i];
> > -   if (entry->t_id == rel_id)
> > -   return entry;
> > -   }
> > +   HASHCTL ctl;
> >  
> > -   if (tsa->tsa_used < TABSTAT_QUANTUM)
> > +   Assert(!pgStatTabHash);
> > +
> > +   memset(&ctl, 0, sizeof(ctl));
> > +   ctl.keysize = sizeof(Oid);
> > +   ctl.entrysize = sizeof(TabStatHashEntry);
> > +   ctl.hcxt = TopMemoryContext;
> > +
> > +   pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup 
> > table",
> > +   TABSTAT_QUANTUM, &ctl, HASH_ELEM | HASH_BLOBS | 
> > HASH_CONTEXT);
> 
> Think it'd be better to move the hash creation into its own function.
> 
> 
> - Andres

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postma

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-06 Thread Amit Langote

Hi Aleksander,

On 2017/03/07 0:22, Aleksander Alekseev wrote:
> Hello.
> 
> OK, here is a patch.
> 
> Benchmark, before:
> 
> ```
> number of transactions actually processed: 1823
> latency average = 1153.495 ms
> latency stddev = 154.366 ms
> tps = 6.061104 (including connections establishing)
> tps = 6.061211 (excluding connections establishing)
> ```
> 
> Benchmark, after:
> 
> ```
> number of transactions actually processed: 2462
> latency average = 853.862 ms
> latency stddev = 112.270 ms
> tps = 8.191861 (including connections establishing)
> tps = 8.192028 (excluding connections establishing)
> ```
> 
> +35% TPS, just as expected. Feel free to run your own benchmarks on
> different datasets and workloads. `perf top` shows that first bottleneck
> is completely eliminated.

That seems like a good gain.

> I did nothing about the second bottleneck
> since as Amit mentioned partition-pruning should solve this anyway and
> apparently any micro-optimizations don't worth an effort.

Sorry, I didn't mean to dissuade you from trying those
micro-optimizations.  If general inheritance cases can benefit from it
(which, until we have a different method, will be used by partitioned
tables as well), I think we should try it.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-06 Thread Andres Freund

Hi,

This issue has bothered me in non-partitioned use-cases recently, so
thanks for taking it up.


On 2017-03-06 18:22:17 +0300, Aleksander Alekseev wrote:
> diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
> index 2fb9a8bf58..fa906e7930 100644
> --- a/src/backend/postmaster/pgstat.c
> +++ b/src/backend/postmaster/pgstat.c
> @@ -160,6 +160,16 @@ typedef struct TabStatusArray
>  
>  static TabStatusArray *pgStatTabList = NULL;

Why are we keeping that list / the "batch" allocation pattern? It
doesn't actually seem useful to me after these changes.  Given that we
don't shrink hash-tables, we could just use the hash-table memory for
this, no?

I think a separate list that only contains things that are "active" in
the current transaction makes sense, because the current logic requires
iterating over a potentially very large array at transaction commit.


> +/* pgStatTabHash entry */
> +typedef struct TabStatHashEntry
> +{
> + Oid t_id;
> + PgStat_TableStatus* tsa_entry;
> +} TabStatHashEntry;
> +
> +/* Hash table for faster t_id -> tsa_entry lookup */
> +static HTAB *pgStatTabHash = NULL;
> +

'faster ... lookup' doesn't strike me as a useful comment, because it's
only useful in relation of the current code to the new code.



>  /*
>   * Backends store per-function info that's waiting to be sent to the 
> collector
>   * in this hash table (indexed by function OID).
> @@ -815,7 +825,13 @@ pgstat_report_stat(bool force)
>   pgstat_send_tabstat(this_msg);
>   this_msg->m_nentries = 0;
>   }
> +
> + /*
> +  * Entry will be freed soon so we need to remove it 
> from the lookup table.
> +  */
> + hash_search(pgStatTabHash, &entry->t_id, HASH_REMOVE, 
> NULL);
>   }

It's not really freed...


>  static PgStat_TableStatus *
>  get_tabstat_entry(Oid rel_id, bool isshared)
>  {
> + TabStatHashEntry* hash_entry;
>   PgStat_TableStatus *entry;
>   TabStatusArray *tsa;
> - TabStatusArray *prev_tsa;
> - int i;
> +
> + /* Try to find an entry */
> + entry = find_tabstat_entry(rel_id);
> + if(entry != NULL)
> + return entry;

Arguably it'd be better to HASH_ENTER directly here, instead of doing
two lookups.


>   /*
> -  * Search the already-used tabstat slots for this relation.
> +  * Entry doesn't exist - creating one.
> +  * First make sure there is a free space in a last element of 
> pgStatTabList.
>*/
> - prev_tsa = NULL;
> - for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = 
> tsa->tsa_next)
> + if (!pgStatTabList)
>   {
> - for (i = 0; i < tsa->tsa_used; i++)
> - {
> - entry = &tsa->tsa_entries[i];
> - if (entry->t_id == rel_id)
> - return entry;
> - }
> + HASHCTL ctl;
>  
> - if (tsa->tsa_used < TABSTAT_QUANTUM)
> + Assert(!pgStatTabHash);
> +
> + memset(&ctl, 0, sizeof(ctl));
> + ctl.keysize = sizeof(Oid);
> + ctl.entrysize = sizeof(TabStatHashEntry);
> + ctl.hcxt = TopMemoryContext;
> +
> + pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup 
> table",
> + TABSTAT_QUANTUM, &ctl, HASH_ELEM | HASH_BLOBS | 
> HASH_CONTEXT);

Think it'd be better to move the hash creation into its own function.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-06 Thread Aleksander Alekseev

Hello.

OK, here is a patch.

Benchmark, before:

```
number of transactions actually processed: 1823
latency average = 1153.495 ms
latency stddev = 154.366 ms
tps = 6.061104 (including connections establishing)
tps = 6.061211 (excluding connections establishing)
```

Benchmark, after:

```
number of transactions actually processed: 2462
latency average = 853.862 ms
latency stddev = 112.270 ms
tps = 8.191861 (including connections establishing)
tps = 8.192028 (excluding connections establishing)
```

+35% TPS, just as expected. Feel free to run your own benchmarks on
different datasets and workloads. `perf top` shows that first bottleneck
is completely eliminated. I did nothing about the second bottleneck
since as Amit mentioned partition-pruning should solve this anyway and
apparently any micro-optimizations don't worth an effort.

Also I checked test coverage using lcov to make sure that this code is
test covered. An exact script I'm using could be found here [1].

[1] https://github.com/afiskon/pgscripts/blob/master/code-coverage.sh

On Wed, Mar 01, 2017 at 10:36:24AM +0900, Amit Langote wrote:
> Hi,
> 
> On 2017/02/28 23:25, Aleksander Alekseev wrote:
> > Hello.
> > 
> > I decided to figure out whether current implementation of declarative
> > partitioning has any bottlenecks when there is a lot of partitions. Here
> > is what I did [1].
> 
> Thanks for sharing.
> 
> > Then:
> > 
> > ```
> > # 2580 is some pk that exists
> > echo 'select * from part_test where pk = 2580;' > t.sql
> > pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax
> > ```
> > 
> > `perf top` showed to bottlenecks [2]. A stacktrace for the first one
> > looks like this [3]:
> > 
> > ```
> > 0x007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') 
> > at pgstat.c:1689
> > 1689if (entry->t_id == rel_id)
> > #0  0x007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 
> > '\000') at pgstat.c:1689
> > #1  0x007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at 
> > pgstat.c:1666
> > #2  0x004c7090 in relation_open (relationId=25696, lockmode=0) at 
> > heapam.c:1137
> > #3  0x004c72c9 in heap_open (relationId=25696, lockmode=0) at 
> > heapam.c:1291
> > (skipped)
> > ```
> > 
> > And here is a stacktrace for the second bottleneck [4]:
> > 
> > ```
> > 0x00584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, 
> > numparents=0x0) at pg_inherits.c:199
> > 199 forboth(lo, rels_list, li, rel_numparents)
> > #0  0x00584fb1 in find_all_inheritors (parentrelId=16393, 
> > lockmode=1, numparents=0x0) at pg_inherits.c:199
> > #1  0x0077fc9f in expand_inherited_rtentry (root=0x1badcb8, 
> > rte=0x1b630b8, rti=1) at prepunion.c:1408
> > #2  0x0077fb67 in expand_inherited_tables (root=0x1badcb8) at 
> > prepunion.c:1335
> > #3  0x00767526 in subquery_planner (glob=0x1b63cc0, 
> > parse=0x1b62fa0, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) 
> > at planner.c:568
> > (skipped)
> > ```
> > 
> > The first one could be easily fixed by introducing a hash table
> > (rel_id -> pgStatList entry). Perhaps hash table should be used only
> > after some threshold. Unless there are any objections I will send a
> > corresponding patch shortly.
> 
> I have never thought about this one really.
> 
> > I didn't explored the second bottleneck closely yet but at first glance
> > it doesn't look much more complicated.
> 
> I don't know which way you're thinking of fixing this, but a planner patch
> to implement faster partition-pruning will have taken care of this, I
> think.  As you may know, even declarative partitioned tables currently
> depend on constraint exclusion for partition-pruning and planner's current
> approach of handling inheritance requires to open all the child tables
> (partitions), whereas the new approach hopefully shouldn't need to do
> that.  I am not sure if looking for a more localized fix for this would be
> worthwhile, although I may be wrong.
> 
> Thanks,
> Amit
> 
> 

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 2fb9a8bf58..fa906e7930 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -160,6 +160,16 @@ typedef struct TabStatusArray
 
 static TabStatusArray *pgStatTabList = NULL;
 
+/* pgStatTabHash entry */
+typedef struct TabStatHashEntry
+{
+	Oid t_id;
+	PgStat_TableStatus* tsa_entry;
+} TabStatHashEntry;
+
+/* Hash table for faster t_id -> tsa_entry lookup */
+static HTAB *pgStatTabHash = NULL;
+
 /*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
@@ -815,7 +825,13 @@ pgstat_report_stat(bool force)
 pgstat_send_tabstat(this_msg);
 this_msg->m_nentries = 0;
 			}
+
+			/*
+			 * Entry will be freed soon so we need to remove it from the lookup table.
+			 */
+			hash_search(pgStatTabH

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-02-28 Thread Amit Langote

Hi,

On 2017/02/28 23:25, Aleksander Alekseev wrote:
> Hello.
> 
> I decided to figure out whether current implementation of declarative
> partitioning has any bottlenecks when there is a lot of partitions. Here
> is what I did [1].

Thanks for sharing.

> Then:
> 
> ```
> # 2580 is some pk that exists
> echo 'select * from part_test where pk = 2580;' > t.sql
> pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax
> ```
> 
> `perf top` showed to bottlenecks [2]. A stacktrace for the first one
> looks like this [3]:
> 
> ```
> 0x007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at 
> pgstat.c:1689
> 1689  if (entry->t_id == rel_id)
> #0  0x007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') 
> at pgstat.c:1689
> #1  0x007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at 
> pgstat.c:1666
> #2  0x004c7090 in relation_open (relationId=25696, lockmode=0) at 
> heapam.c:1137
> #3  0x004c72c9 in heap_open (relationId=25696, lockmode=0) at 
> heapam.c:1291
> (skipped)
> ```
> 
> And here is a stacktrace for the second bottleneck [4]:
> 
> ```
> 0x00584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, 
> numparents=0x0) at pg_inherits.c:199
> 199   forboth(lo, rels_list, li, rel_numparents)
> #0  0x00584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, 
> numparents=0x0) at pg_inherits.c:199
> #1  0x0077fc9f in expand_inherited_rtentry (root=0x1badcb8, 
> rte=0x1b630b8, rti=1) at prepunion.c:1408
> #2  0x0077fb67 in expand_inherited_tables (root=0x1badcb8) at 
> prepunion.c:1335
> #3  0x00767526 in subquery_planner (glob=0x1b63cc0, parse=0x1b62fa0, 
> parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at planner.c:568
> (skipped)
> ```
> 
> The first one could be easily fixed by introducing a hash table
> (rel_id -> pgStatList entry). Perhaps hash table should be used only
> after some threshold. Unless there are any objections I will send a
> corresponding patch shortly.

I have never thought about this one really.

> I didn't explored the second bottleneck closely yet but at first glance
> it doesn't look much more complicated.

I don't know which way you're thinking of fixing this, but a planner patch
to implement faster partition-pruning will have taken care of this, I
think.  As you may know, even declarative partitioned tables currently
depend on constraint exclusion for partition-pruning and planner's current
approach of handling inheritance requires to open all the child tables
(partitions), whereas the new approach hopefully shouldn't need to do
that.  I am not sure if looking for a more localized fix for this would be
worthwhile, although I may be wrong.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Declarative partitioning optimization for large amount of partitions

2017-02-28 Thread Aleksander Alekseev

Hello.

I decided to figure out whether current implementation of declarative
partitioning has any bottlenecks when there is a lot of partitions. Here
is what I did [1].

```
-- init schema

\timing on

CREATE TABLE part_test (pk int not null, k int, v varchar(128)) PARTITION BY 
RANGE(pk);

do $$
declare
i integer;
begin
for i in 1 .. 1
loop
raise notice 'i = %', i;
execute ('CREATE TABLE part_test_' || i ||
 ' PARTITION OF part_test FOR VALUES FROM (' ||
 (1 + (i-1)*1000) || ') to (' || ( (i * 1000) + 1) || ');'
);
end loop;
end $$;

-- fill tables with some data

do $$
declare
i integer;
begin
for i in 1 .. 100*1000
loop
raise notice 'i = %', i;
execute ('insert into part_test values ( ceil(random()*(1-1)*1000), 
ceil(random()*1*1000),  || ceil(random()*1*1000) );');
end loop;
end $$;
```

Then:

```
# 2580 is some pk that exists
echo 'select * from part_test where pk = 2580;' > t.sql
pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax
```

`perf top` showed to bottlenecks [2]. A stacktrace for the first one
looks like this [3]:

```
0x007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at 
pgstat.c:1689
1689if (entry->t_id == rel_id)
#0  0x007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') 
at pgstat.c:1689
#1  0x007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at pgstat.c:1666
#2  0x004c7090 in relation_open (relationId=25696, lockmode=0) at 
heapam.c:1137
#3  0x004c72c9 in heap_open (relationId=25696, lockmode=0) at 
heapam.c:1291
(skipped)
```

And here is a stacktrace for the second bottleneck [4]:

```
0x00584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, 
numparents=0x0) at pg_inherits.c:199
199 forboth(lo, rels_list, li, rel_numparents)
#0  0x00584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, 
numparents=0x0) at pg_inherits.c:199
#1  0x0077fc9f in expand_inherited_rtentry (root=0x1badcb8, 
rte=0x1b630b8, rti=1) at prepunion.c:1408
#2  0x0077fb67 in expand_inherited_tables (root=0x1badcb8) at 
prepunion.c:1335
#3  0x00767526 in subquery_planner (glob=0x1b63cc0, parse=0x1b62fa0, 
parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at planner.c:568
(skipped)
```

The first one could be easily fixed by introducing a hash table
(rel_id -> pgStatList entry). Perhaps hash table should be used only
after some threshold. Unless there are any objections I will send a
corresponding patch shortly.

I didn't explored the second bottleneck closely yet but at first glance
it doesn't look much more complicated.

Please don't hesitate to share your thoughts regarding this matter.

[1] http://afiskon.ru/s/e3/5f47af9102_benchmark.txt
[2] http://afiskon.ru/s/00/2008c4ae66_temp.png
[3] http://afiskon.ru/s/23/650f0afc89_stack.txt
[4] http://afiskon.ru/s/03/a7e685a4db_stack2.txt

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Declarative partitioning - another take

2017-02-23 Thread Yugo Nagata

Hi Amit,

On Thu, 23 Feb 2017 16:29:32 +0900
Amit Langote  wrote:

> Thanks for taking care of that.
> 
> + *   PartitionRoot   relation descriptor for parent 
> relation
> 
> Maybe: relation descriptor for root parent relation

This seems better. Patch is updated.

Thanks,
-- 
Yugo Nagata 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..af3367a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -326,6 +326,7 @@ typedef struct JunkFilter
  *		onConflictSetWhere		list of ON CONFLICT DO UPDATE exprs (qual)
  *		PartitionCheck			partition check expression
  *		PartitionCheckExpr		partition check expression state
+ *		PartitionRoot			relation descriptor for root parent relation
  * 
  */
 typedef struct ResultRelInfo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-22 Thread Amit Langote

Nagata-san,

On 2017/02/23 16:17, Yugo Nagata wrote:
> Hi,
> 
> I found that a comment for PartitionRoot in ResultRelInfo is missing.
> Although this is trivial, since all other members have comments, I
> think it is needed. Attached is the patch to fix it.

Thanks for taking care of that.

+ * PartitionRoot   relation descriptor for parent 
relation

Maybe: relation descriptor for root parent relation

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-22 Thread Yugo Nagata

Hi,

I found that a comment for PartitionRoot in ResultRelInfo is missing.
Although this is trivial, since all other members have comments, I
think it is needed. Attached is the patch to fix it.

Regards,
Yugo Nagata

On Tue, 27 Dec 2016 17:59:05 +0900
Amit Langote  wrote:

> On 2016/12/26 19:46, Amit Langote wrote:
> > (Perhaps, the following should be its own new thread)
> > 
> > I noticed that ExecProcessReturning() doesn't work properly after tuple
> > routing (example shows how returning tableoid currently fails but I
> > mention some other issues below):
> > 
> > create table p (a int, b int) partition by range (a);
> > create table p1 partition of p for values from (1) to (10);
> > insert into p values (1) returning tableoid::regclass, *;
> >  tableoid | a | b
> > --+---+---
> >  -| 1 |
> > (1 row)
> > 
> > INSERT 0 1
> > 
> > I tried to fix that in 0007 to get:
> > 
> > insert into p values (1) returning tableoid::regclass, *;
> >  tableoid | a | b
> > --+---+---
> >  p| 1 |
> > (1 row)
> > 
> > INSERT 0 1
> > 
> > But I think it *may* be wrong to return the root table OID for tuples
> > inserted into leaf partitions, because with select we get partition OIDs:
> > 
> > select tableoid::regclass, * from p;
> >  tableoid | a | b
> > --+---+---
> >  p1   | 1 |
> > (1 row)
> > 
> > If so, that means we should build the projection info (corresponding to
> > the returning list) for each target partition somehow.  ISTM, that's going
> > to have to be done within the planner by appropriate inheritance
> > translation of the original returning targetlist.
> 
> Turns out getting the 2nd result may not require planner tweaks after all.
> Unless I'm missing something, translation of varattnos of the RETURNING
> target list can be done as late as ExecInitModifyTable() for the insert
> case, unlike update/delete (which do require planner's attention).
> 
> I updated the patch 0007 to implement the same, including the test. While
> doing that, I realized map_partition_varattnos introduced in 0003 is
> rather restrictive in its applicability, because it assumes varno = 1 for
> the expressions it accepts as input for the mapping.  Mapping returning
> (target) list required modifying map_partition_varattnos to accept
> target_varno as additional argument.  That way, we can map arbitrary
> expressions from the parent attributes numbers to partition attribute
> numbers for expressions not limited to partition constraints.
> 
> Patches 0001 to 0006 unchanged.
> 
> Thanks,
> Amit


-- 
Yugo Nagata 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..845bdf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -326,6 +326,7 @@ typedef struct JunkFilter
  *		onConflictSetWhere		list of ON CONFLICT DO UPDATE exprs (qual)
  *		PartitionCheck			partition check expression
  *		PartitionCheckExpr		partition check expression state
+ *		PartitionRoot			relation descriptor for parent relation
  * 
  */
 typedef struct ResultRelInfo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-14 Thread Robert Haas

On Fri, Feb 10, 2017 at 12:54 AM, Amit Langote
 wrote:
> On 2017/02/09 15:22, amul sul wrote:
>> About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
>> following test is already covered in alter_table.sql @ Line # 1918,
>> instead of this kindly add test for list_partition:
>>
>>  77 +-- cannot drop NOT NULL constraint of a range partition key column
>>  78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
>>  79 +
>
> Thanks for the review!  You're right.  Updated patch attached.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-09 Thread Amit Langote

On 2017/02/09 15:22, amul sul wrote:
> About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
> following test is already covered in alter_table.sql @ Line # 1918,
> instead of this kindly add test for list_partition:
> 
>  77 +-- cannot drop NOT NULL constraint of a range partition key column
>  78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
>  79 +

Thanks for the review!  You're right.  Updated patch attached.

Thanks,
Amit
>From a4335048e92462fb55fa6db0d830c7c066cd7011 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 10 Feb 2017 14:52:18 +0900
Subject: [PATCH] Check partition strategy in ATExecDropNotNull

We should prevent dropping the NOT NULL constraint on a column only
if the column is in the *range* partition key (as the error message
also says).  Add a regression test while at it.

Reported by: Amul Sul
Patch by: Amit Langote
Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 22 +-
 src/test/regress/expected/alter_table.out |  4 
 src/test/regress/sql/alter_table.sql  |  5 +
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 37a4c4a3d6..f33aa70da6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
-		int			partnatts = get_partition_natts(key),
-	i;
 
-		for (i = 0; i < partnatts; i++)
+		if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE)
 		{
-			AttrNumber	partattnum = get_partition_col_attnum(key, i);
+			int			partnatts = get_partition_natts(key),
+		i;
 
-			if (partattnum == attnum)
-ereport(ERROR,
-		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-		 errmsg("column \"%s\" is in range partition key",
-colName)));
+			for (i = 0; i < partnatts; i++)
+			{
+AttrNumber	partattnum = get_partition_col_attnum(key, i);
+
+if (partattnum == attnum)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+			 errmsg("column \"%s\" is in range partition key",
+	colName)));
+			}
 		}
 	}
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index d8e7b61294..b0e80a7788 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3029,6 +3029,10 @@ ERROR:  cannot alter type of column referenced in partition key expression
 -- cannot drop NOT NULL on columns in the range partition key
 ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL;
 ERROR:  column "a" is in range partition key
+-- it's fine however to drop one on the list partition key column
+CREATE TABLE list_partitioned (a int not null) partition by list (a);
+ALTER TABLE list_partitioned ALTER a DROP NOT NULL;
+DROP TABLE list_partitioned;
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE foo (
 	a int,
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 1f551ec53c..7513769359 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1917,6 +1917,11 @@ ALTER TABLE partitioned ALTER COLUMN b TYPE char(5);
 -- cannot drop NOT NULL on columns in the range partition key
 ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL;
 
+-- it's fine however to drop one on the list partition key column
+CREATE TABLE list_partitioned (a int not null) partition by list (a);
+ALTER TABLE list_partitioned ALTER a DROP NOT NULL;
+DROP TABLE list_partitioned;
+
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE foo (
 	a int,
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread amul sul

About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
following test is already covered in alter_table.sql @ Line # 1918,
instead of this kindly add test for list_partition:

 77 +-- cannot drop NOT NULL constraint of a range partition key column
 78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
 79 +

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread Amit Langote

On 2017/02/08 21:20, amul sul wrote:
> Regarding following code in ATExecDropNotNull function, I don't see
> any special check for RANGE partitioned, is it intended to have same
> restriction for LIST partitioned too, I guess not?
> 
>   /*
>  * If the table is a range partitioned table, check that the column is not
>  * in the partition key.
>  */
> if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> {
> PartitionKey key = RelationGetPartitionKey(rel);
> int partnatts = get_partition_natts(key),
> i;
> 
> for (i = 0; i < partnatts; i++)
> {
> AttrNumber  partattnum = get_partition_col_attnum(key, i);
> 
> if (partattnum == attnum)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>  errmsg("column \"%s\" is in range partition key",
> colName)));
> }
> }

Good catch!  Seems like an oversight, attached fixes it.

Thanks,
Amit
>From b84ac63b6b4acd19a09e8507cf199db127c06d71 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 9 Feb 2017 10:31:58 +0900
Subject: [PATCH] Check partition strategy in ATExecDropNotNull

We should prevent dropping the NOT NULL constraint on a column only
if the column is in the *range* partition key (as the error message
also says).  Add a regression test while at it.

Reported by: Amul Sul
Patch by: Amit Langote
Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 22 +-
 src/test/regress/expected/alter_table.out |  3 +++
 src/test/regress/sql/alter_table.sql  |  3 +++
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 37a4c4a3d6..f33aa70da6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
-		int			partnatts = get_partition_natts(key),
-	i;
 
-		for (i = 0; i < partnatts; i++)
+		if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE)
 		{
-			AttrNumber	partattnum = get_partition_col_attnum(key, i);
+			int			partnatts = get_partition_natts(key),
+		i;
 
-			if (partattnum == attnum)
-ereport(ERROR,
-		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-		 errmsg("column \"%s\" is in range partition key",
-colName)));
+			for (i = 0; i < partnatts; i++)
+			{
+AttrNumber	partattnum = get_partition_col_attnum(key, i);
+
+if (partattnum == attnum)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+			 errmsg("column \"%s\" is in range partition key",
+	colName)));
+			}
 		}
 	}
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index d8e7b61294..e6d45fbf9c 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3330,6 +3330,9 @@ ALTER TABLE list_parted2 DROP COLUMN b;
 ERROR:  cannot drop column named in partition key
 ALTER TABLE list_parted2 ALTER COLUMN b TYPE text;
 ERROR:  cannot alter type of column named in partition key
+-- cannot drop NOT NULL constraint of a range partition key column
+ALTER TABLE range_parted ALTER a DROP NOT NULL;
+ERROR:  column "a" is in range partition key
 -- cleanup: avoid using CASCADE
 DROP TABLE list_parted, part_1;
 DROP TABLE list_parted2, part_2, part_5, part_5_a;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 1f551ec53c..12edb8b3ba 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2189,6 +2189,9 @@ ALTER TABLE part_2 INHERIT inh_test;
 ALTER TABLE list_parted2 DROP COLUMN b;
 ALTER TABLE list_parted2 ALTER COLUMN b TYPE text;
 
+-- cannot drop NOT NULL constraint of a range partition key column
+ALTER TABLE range_parted ALTER a DROP NOT NULL;
+
 -- cleanup: avoid using CASCADE
 DROP TABLE list_parted, part_1;
 DROP TABLE list_parted2, part_2, part_5, part_5_a;
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread amul sul

Hi Amit,

Regarding following code in ATExecDropNotNull function, I don't see
any special check for RANGE partitioned, is it intended to have same
restriction for LIST partitioned too, I guess not?

  /*
 * If the table is a range partitioned table, check that the column is not
 * in the partition key.
 */
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
PartitionKey key = RelationGetPartitionKey(rel);
int partnatts = get_partition_natts(key),
i;

for (i = 0; i < partnatts; i++)
{
AttrNumber  partattnum = get_partition_col_attnum(key, i);

if (partattnum == attnum)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 errmsg("column \"%s\" is in range partition key",
colName)));
}
}

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-03 Thread Robert Haas

On Mon, Jan 30, 2017 at 4:42 PM, Peter Eisentraut
 wrote:
> On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
>> The documentation available at
>> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
>> does not make it clear that the lower bound of a range partition is
>> always inclusive and the higher one is exclusive. I think a note in
>> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
>> would be helpful.
>
> Hmm.  I see the practical use of that, but I think this is going to be a
> source of endless confusion.  Can we make that a bit clearer in the
> syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

I am not in favor of adding mandatory noise words to the syntax; it's
fairly verbose already.  I think it would be reasonable to eventually
consider supporting optional keywords to allow multiple behaviors, but
I don't think that the usefulness of that has been so firmly
established that we should do it right this minute.  I think there are
a heck of a lot of other things about this partitioning implementation
that are more urgently in need of improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-02-02 Thread Amit Langote

On 2017/01/31 6:42, Peter Eisentraut wrote:
> On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
>> The documentation available at
>> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
>> does not make it clear that the lower bound of a range partition is
>> always inclusive and the higher one is exclusive. I think a note in
>> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
>> would be helpful.
> 
> Hmm.  I see the practical use of that, but I think this is going to be a
> source of endless confusion.  Can we make that a bit clearer in the
> syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

The decision not to make that configurable with INCLUSIVE/EXCLUSIVE syntax
was deliberate.  To summarize, we can start with a default configuration
catering to most practical cases (that is, inclusive lower and exclusive
upper bounds) and documenting so (not done yet, which I will post a doc
patch today for).  If it turns out that there is some demand for making
that configurable, we can later add the code to handle that internally
plus the syntax.  But *starting* with that syntax means we have to
potentially needlessly carry the code to handle seldom used cases that
could not be made as efficient as it is now with all lower bounds being
inclusive and upper bounds exclusive.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZou4ApEvC_nfhOxsi5G4SoD_evwNaiYn60ZcJ4XB_-QQ%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-30 Thread Peter Eisentraut

On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
> The documentation available at
> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
> does not make it clear that the lower bound of a range partition is
> always inclusive and the higher one is exclusive. I think a note in
> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
> would be helpful.

Hmm.  I see the practical use of that, but I think this is going to be a
source of endless confusion.  Can we make that a bit clearer in the
syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut
 wrote:
> On 1/18/17 2:32 PM, Robert Haas wrote:
>> Unless we can find something official, I suppose we should just
>> display BASE TABLE in that case as we do in other cases.  I wonder if
>> the schema needs some broader revision; for example, are there
>> information_schema elements intended to show information about
>> partitions?
>
> Is it intentional that we show the partitions by default in \d,
> pg_tables, information_schema.tables?  Or should we treat those as
> somewhat-hidden details?

I'm not really sure what the right thing to do is there.  I was hoping
you had an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-25 Thread Peter Eisentraut

On 1/18/17 2:32 PM, Robert Haas wrote:
> Unless we can find something official, I suppose we should just
> display BASE TABLE in that case as we do in other cases.  I wonder if
> the schema needs some broader revision; for example, are there
> information_schema elements intended to show information about
> partitions?

Is it intentional that we show the partitions by default in \d,
pg_tables, information_schema.tables?  Or should we treat those as
somewhat-hidden details?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

Hi Ashutosh,

On 2017/01/25 14:54, Ashutosh Bapat wrote:
> The documentation available at
> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
> does not make it clear that the lower bound of a range partition is
> always inclusive and the higher one is exclusive. I think a note in
> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
> would be helpful.

Yes, I'm working on a documentation patch for that and a few other things
such as revising the Partitioning chapter.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Ashutosh Bapat

The documentation available at
https://www.postgresql.org/docs/devel/static/sql-createtable.html,
does not make it clear that the lower bound of a range partition is
always inclusive and the higher one is exclusive. I think a note in
section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
would be helpful.

On Wed, Jan 25, 2017 at 7:11 AM, Amit Langote
 wrote:
> On 2017/01/25 5:55, Robert Haas wrote:
>> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>>  wrote:
>>> [ new patches ]
>>
>> Committed 0001 and 0002.  See my earlier email for comments on 0003.
>
> It seems patches for all the issues mentioned in this thread so far,
> except 0003 (I just sent an updated version in another email), have been
> committed.  Thanks a lot for your time.  I will create new threads for any
> more patches from here on.
>
> Regards,
> Amit
>
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

On 2017/01/25 5:55, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>  wrote:
>> [ new patches ]
> 
> Committed 0001 and 0002.  See my earlier email for comments on 0003.

It seems patches for all the issues mentioned in this thread so far,
except 0003 (I just sent an updated version in another email), have been
committed.  Thanks a lot for your time.  I will create new threads for any
more patches from here on.

Regards,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

On 2017/01/25 2:56, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>  wrote:
>>> But I wonder why we don't instead just change this function to
>>> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
>>> comparing the type OIDs is to find out whether the table-has-OIDs
>>> setting matches, we could instead test that directly and avoid needing
>>> to pass an extra argument.  I wonder if there's some other reason this
>>> code is there which is not documented in the comment...
>>
>> With the following patch, regression tests run fine:
>>
>>   if (indesc->natts == outdesc->natts &&
>> - indesc->tdtypeid == outdesc->tdtypeid)
>> + indesc->tdhasoid != outdesc->tdhasoid)
>>  {
>>
>> If checking tdtypeid (instead of tdhasoid directly) has some other
>> consideration, I'd would have seen at least some tests broken by this
>> change.  So, if we are to go with this, I too prefer it over my previous
>> proposal to add an argument to convert_tuples_by_name().  Attached 0003
>> implements the above approach.
> 
> I think this is not quite right.  First, the patch compares the
> tdhasoid status with != rather than ==, which would have the effect of
> saying that we can skip conversion of the has-OID statuses do NOT
> match.  That can't be right.

You're right.

> Second, I believe that the comments
> imply that conversion should be done if *either* tuple has OIDs.  I
> believe that's because whoever wrote this comment thought that we
> needed to replace the OID if the tuple already had one, which is what
> do_convert_tuple would do.  I'm not sure whether that's really
> necessary, but we're less likely to break anything if we preserve the
> existing behavior, and I don't think we lose much from doing so
> because few user tables will have OIDs.  So I would change this test
> to if (indesc->natts == outdesc->natts && !indesc->tdhasoid &&
> !outdesc->tdhasoid), and I'd revise the one in
> convert_tuples_by_position() to match.  Then I think it's much clearer
> that we're just optimizing what's there already, not changing the
> behavior.

Agreed.  Updated patch attached.

Thanks,
Amit
>From c1fa4b9f04f328b8df54ef487ee9d46f5978e0de Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Dec 2016 17:44:14 +0900
Subject: [PATCH] Avoid tuple coversion in common partitioning cases

Currently, the tuple conversion is performed after a tuple is routed,
even if the attributes of a target leaf partition map one-to-one with
those of the root table, which is wasteful.  Avoid that by making
convert_tuples_by_name() return a NULL map for such cases.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/access/common/tupconvert.c | 18 ++
 src/backend/catalog/partition.c|  3 +--
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index b17ceafa6e..a4012525d8 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -138,12 +138,13 @@ convert_tuples_by_position(TupleDesc indesc,
 		   nincols, noutcols)));
 
 	/*
-	 * Check to see if the map is one-to-one and the tuple types are the same.
-	 * (We check the latter because if they're not, we want to do conversion
-	 * to inject the right OID into the tuple datum.)
+	 * Check to see if the map is one-to-one, in which case we need not do
+	 * the tuple conversion.  That's not enough though if either source or
+	 * destination (tuples) contains OIDs; we'd need conversion in that case
+	 * to inject the right OID into the tuple datum.
 	 */
 	if (indesc->natts == outdesc->natts &&
-		indesc->tdtypeid == outdesc->tdtypeid)
+		!indesc->tdhasoid && !outdesc->tdhasoid)
 	{
 		for (i = 0; i < n; i++)
 		{
@@ -214,12 +215,13 @@ convert_tuples_by_name(TupleDesc indesc,
 	attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
 	/*
-	 * Check to see if the map is one-to-one and the tuple types are the same.
-	 * (We check the latter because if they're not, we want to do conversion
-	 * to inject the right OID into the tuple datum.)
+	 * Check to see if the map is one-to-one, in which case we need not do
+	 * the tuple conversion.  That's not enough though if either source or
+	 * destination (tuples) contains OIDs; we'd need conversion in that case
+	 * to inject the right OID into the tuple datum.
 	 */
 	if (indesc->natts == outdesc->natts &&
-		indesc->tdtypeid == outdesc->tdtypeid)
+		!indesc->tdhasoid && !outdesc->tdhasoid)
 	{
 		same = true;
 		for (i = 0; i < n; i++)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7be60529c5..4bcef58763 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1700,12 +1700,11 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			return -1;
 		}
 
-		if (myslot != NULL)
+		if (myslot != NULL && map != NULL)
 		{
 			HeapTuple	tuple = ExecFetchSlotTuple(slot);
 
 			Ex

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Amit Langote

Hi Keith,

On 2017/01/20 12:40, Keith Fiske wrote:
> So testing things out in pg_partman for native sub-partitioning and ran
> into what is a bug for me that I know I have to fix, but I'm curious if
> this can be prevented in the first place within the native partitioning
> code itself. The below shows a sub-partitioning set where the sub-partition
> has a constraint range that is outside of the range of its parent. If the
> columns were different I could see where this would be allowed, but the
> same column is used throughout the levels of sub-partitioning.
> Understandable if that may be too complex to check for, but figured I'd
> bring it up as something I accidentally ran into in case you see an easy
> way to prevent it.

This was discussed.  See Robert's response (2nd part of):
https://www.postgresql.org/message-id/CA%2BTgmoaQABrsLQK4ms_4NiyavyJGS-b6ZFkZBBNC%2B-P5DjJNFA%40mail.gmail.com

In short, defining partitions across different levels such that the data
user intended to insert into the table (the part of the sub-partition's
range that doesn't overlap with its parent's) couldn't be, that's an
operator error.  It's like adding contradictory check constraints to the
table:

create table foo (a int check (a > 0 and a < 0));
insert into foo values (1);
ERROR:  new row for relation "foo" violates check constraint "foo_a_check"
DETAIL:  Failing row contains (1).

One (perhaps the only) thing that could be done is to warn users to
prevent this kind of mistake through documentation.  Trying to do anything
else in the core partitioning code is making it too complicated for not
much benefit (also see Robert's last line, :)).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Robert Haas

On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
 wrote:
> [ new patches ]

Committed 0001 and 0002.  See my earlier email for comments on 0003.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-24 Thread Robert Haas

On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
 wrote:
>> But I wonder why we don't instead just change this function to
>> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
>> comparing the type OIDs is to find out whether the table-has-OIDs
>> setting matches, we could instead test that directly and avoid needing
>> to pass an extra argument.  I wonder if there's some other reason this
>> code is there which is not documented in the comment...
>
> With the following patch, regression tests run fine:
>
>   if (indesc->natts == outdesc->natts &&
> - indesc->tdtypeid == outdesc->tdtypeid)
> + indesc->tdhasoid != outdesc->tdhasoid)
>  {
>
> If checking tdtypeid (instead of tdhasoid directly) has some other
> consideration, I'd would have seen at least some tests broken by this
> change.  So, if we are to go with this, I too prefer it over my previous
> proposal to add an argument to convert_tuples_by_name().  Attached 0003
> implements the above approach.

I think this is not quite right.  First, the patch compares the
tdhasoid status with != rather than ==, which would have the effect of
saying that we can skip conversion of the has-OID statuses do NOT
match.  That can't be right.  Second, I believe that the comments
imply that conversion should be done if *either* tuple has OIDs.  I
believe that's because whoever wrote this comment thought that we
needed to replace the OID if the tuple already had one, which is what
do_convert_tuple would do.  I'm not sure whether that's really
necessary, but we're less likely to break anything if we preserve the
existing behavior, and I don't think we lose much from doing so
because few user tables will have OIDs.  So I would change this test
to if (indesc->natts == outdesc->natts && !indesc->tdhasoid &&
!outdesc->tdhasoid), and I'd revise the one in
convert_tuples_by_position() to match.  Then I think it's much clearer
that we're just optimizing what's there already, not changing the
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. BulkInsertState

2017-01-24 Thread Robert Haas

On Mon, Jan 23, 2017 at 5:25 AM, Amit Langote
 wrote:
> I tried implementing the second idea in the attached patch.  It fixes the
> bug (multiple reports as mentioned in the commit message) that tuples may
> be inserted into the wrong partition.

Looks good to me, thanks.  Committed with a few tweaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. BulkInsertState

2017-01-23 Thread Amit Langote

On 2017/01/19 5:25, Robert Haas wrote:
> On Wed, Jan 11, 2017 at 10:53 PM, Amit Langote wrote:
>> On 2017/01/06 20:23, Amit Langote wrote:
>>>
>>> If a single BulkInsertState object is passed to
>>> heap_insert()/heap_multi_insert() for different heaps corresponding to
>>> different partitions (from one input tuple to next), tuples might end up
>>> going into wrong heaps (like demonstrated in one of the reports [1]).  A
>>> simple solution is to disable bulk-insert in case of partitioned tables.
>>>
>>> But my patch (or its motivations) was slightly wrongheaded, wherein I
>>> conflated multi-insert stuff and bulk-insert considerations.  I revised
>>> 0002 to not do that.
>>
>> Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
>> Attaching updated 0002 along with rebased 0001 and 0003.
> 
> The BulkInsertState is not there only to improve performance.  It's
> also there to make sure we use a BufferAccessStrategy, so that we
> don't trash the whole buffer arena.  See commit
> 85e2cedf985bfecaf43a18ca17433070f439fb0e.  If a partitioned table uses
> a separate BulkInsertState for each partition, I believe it will also
> end up using a separate ring of buffers for every partition.  That may
> well be faster than copying into an unpartitioned table in some cases,
> because dirtying everything in the buffer arena without actually
> writing any of those buffers is a lot faster than actually doing the
> writes.  But it is also anti-social behavior; we have
> BufferAccessStrategy objects for a reason.

Thanks for the explanation.  I agree that creating thousands of
BufferAccessStrategy objects would be a bad idea.

> One idea would be to have each partition use a separate
> BulkInsertState but have them point to the same underlying
> BufferAccessStrategy, but even that's problematic, because it could
> result in us holding a gigantic number of pins (one per partition). I
> think maybe a better idea would be to add an additional function
> ReleaseBulkInsertStatePin() which gets called whenever we switch
> relations, and then just use the same BulkInsertState throughout.

I tried implementing the second idea in the attached patch.  It fixes the
bug (multiple reports as mentioned in the commit message) that tuples may
be inserted into the wrong partition.

Thanks,
Amit
>From 2e31e97389b0911f749fcf59c0a24c0208a5e2b3 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 23 Jan 2017 17:00:19 +0900
Subject: [PATCH] Make partitioned tables play nicely with bulk-insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When COPY'ing (or bulk-inserting) into a partitioned table, the target
heap may change from one tuple to next.  We better ask ReadBufferBI()
to get a new buffer every time such change occurs.  We do that by
having CopyFrom() call the new ReleaseBulkInsertStatePin() which unpins
the BulkInsertState.cur_buffer and sets it to InvalidBuffer, which is
enough a signal for ReadBufferBI() to do the right thing.

This also fixes the bug that tuples ended up being inserted into the
wrong partition, which occurred exactly because the wrong buffer was
used.

Reported by: é«å¢ç¦, Venkata B Nagothi, Ragnar Ouchterlony
Patch by: Amit Langote (idea by Robert Haas)
Reports: https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
 https://www.postgresql.org/message-id/CAEyp7J9WiX0L3DoiNcRrY-9iyw%3DqP%2Bj%3DDLsAnNFF1xT2J1ggfQ%40mail.gmail.com
 https://www.postgresql.org/message-id/16d73804-c9cd-14c5-463e-5caad563ff77%40agama.tv
---
 src/backend/access/heap/heapam.c | 11 +++
 src/backend/commands/copy.c  | 12 
 src/include/access/heapam.h  |  1 +
 3 files changed, 24 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1ce42ea970..5fd7f1e1a2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2324,6 +2324,17 @@ FreeBulkInsertState(BulkInsertState bistate)
 	pfree(bistate);
 }
 
+/*
+ * ReleaseBulkInsertStatePin - release a buffer currently held in bistate
+ */
+void
+ReleaseBulkInsertStatePin(BulkInsertState bistate)
+{
+	if (bistate->current_buf != InvalidBuffer)
+		ReleaseBuffer(bistate->current_buf);
+	bistate->current_buf = InvalidBuffer;
+}
+
 
 /*
  *	heap_insert		- insert tuple into a heap
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c05e14e26f..86bd53c179 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2307,6 +2307,7 @@ CopyFrom(CopyState cstate)
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
 	int			nBufferedTuples = 0;
+	int			prev_leaf_part_index = -1;
 
 #define MAX_BUFFERED_TUPLES 1000
 	HeapTuple  *bufferedTuples = NULL;	/* initialize to silence warning */
@@ -2562,6 +2563,17 @@ CopyFrom(CopyState cstate)
    leaf_part_index < cstate->num_partitions);
 
 			/*
+			 * If the current tuple is mapped to a

Re: [HACKERS] Declarative partitioning - another take

2017-01-22 Thread Amit Langote

On 2017/01/21 6:29, Robert Haas wrote:
> On Fri, Jan 20, 2017 at 1:15 AM, Andres Freund  wrote:
>> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>>> Committed.
>>
>> One of the patches around this topic committed recently seems to cause
>> valgrind failures like
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02
>> :
> 
> Tom Lane independently reported this same issue.  I believe I've fixed
> it.  Sorry for not noticing and crediting this report also.

Thanks Robert for looking at this and committing the fix.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-20 Thread Robert Haas

On Fri, Jan 20, 2017 at 1:15 AM, Andres Freund  wrote:
> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>> Committed.
>
> One of the patches around this topic committed recently seems to cause
> valgrind failures like
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02
> :

Tom Lane independently reported this same issue.  I believe I've fixed
it.  Sorry for not noticing and crediting this report also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

Hi Andres,

On 2017/01/20 15:15, Andres Freund wrote:
> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>> Committed.
> 
> One of the patches around this topic committed recently seems to cause
> valgrind failures like
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02
> :
> ==24969== Conditional jump or move depends on uninitialised value(s)
> ==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)
> ==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318)
> ==24969==by 0x536643: partition_bounds_equal (partition.c:627)
> ==24969==by 0x820864: equalPartitionDescs (relcache.c:1203)
> ==24969==by 0x82423A: RelationClearRelation (relcache.c:2553)
> ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
> ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
> ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
> ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
> ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
> ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
> ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
> ==24969==  Uninitialised value was created by a heap allocation
> ==24969==at 0x85AA83: palloc (mcxt.c:914)
> ==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528)
> ==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348)
> ==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524)
> ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
> ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
> ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
> ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
> ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
> ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
> ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
> ==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490)
> ==24969== 
> ==24969== VALGRINDERROR-END

Thanks for the report.  This being my first time reading a valgrind report
on buildfarm, is it correct to to assume that the command immediately
preceding VALGRINDERROR-BEGIN is what triggered the failure?

... LOG:  statement: truncate list_parted;
==24969== VALGRINDERROR-BEGIN
==24969== Conditional jump or move depends on uninitialised value(s)
==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)

So in this case: truncate list_parted?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Andres Freund

On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
> Committed.

One of the patches around this topic committed recently seems to cause
valgrind failures like
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02
:
==24969== Conditional jump or move depends on uninitialised value(s)
==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)
==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318)
==24969==by 0x536643: partition_bounds_equal (partition.c:627)
==24969==by 0x820864: equalPartitionDescs (relcache.c:1203)
==24969==by 0x82423A: RelationClearRelation (relcache.c:2553)
==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
==24969==  Uninitialised value was created by a heap allocation
==24969==at 0x85AA83: palloc (mcxt.c:914)
==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528)
==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348)
==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524)
==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490)
==24969== 
==24969== VALGRINDERROR-END


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Keith Fiske

So testing things out in pg_partman for native sub-partitioning and ran
into what is a bug for me that I know I have to fix, but I'm curious if
this can be prevented in the first place within the native partitioning
code itself. The below shows a sub-partitioning set where the sub-partition
has a constraint range that is outside of the range of its parent. If the
columns were different I could see where this would be allowed, but the
same column is used throughout the levels of sub-partitioning.
Understandable if that may be too complex to check for, but figured I'd
bring it up as something I accidentally ran into in case you see an easy
way to prevent it.

This was encountered as of commit ba61a04bc7fefeee03416d9911eb825c4897c223.

keith@keith=# \d+ partman_test.time_taptest_table
  Table "partman_test.time_taptest_table"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p2015 FOR VALUES FROM
('2015-01-01 00:00:00-05') TO ('2016-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2016 FOR VALUES FROM
('2016-01-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2017 FOR VALUES FROM
('2017-01-01 00:00:00-05') TO ('2018-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2018 FOR VALUES FROM
('2018-01-01 00:00:00-05') TO ('2019-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2019 FOR VALUES FROM
('2019-01-01 00:00:00-05') TO ('2020-01-01 00:00:00-05')

keith@keith=# \d+ partman_test.time_taptest_table_p2015
   Table "partman_test.time_taptest_table_p2015"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition of: partman_test.time_taptest_table FOR VALUES FROM ('2015-01-01
00:00:00-05') TO ('2016-01-01 00:00:00-05')
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p2015_p2016_11 FOR VALUES FROM
('2016-11-01 00:00:00-04') TO ('2016-12-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2016_12 FOR VALUES FROM
('2016-12-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_01 FOR VALUES FROM
('2017-01-01 00:00:00-05') TO ('2017-02-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_02 FOR VALUES FROM
('2017-02-01 00:00:00-05') TO ('2017-03-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_03 FOR VALUES FROM
('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04')

keith@keith=# \d+ partman_test.time_taptest_table_p2015_p2017_03
   Table
"partman_test.time_taptest_table_p2015_p2017_03"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition of: partman_test.time_taptest_table_p2015 FOR VALUES FROM
('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04')

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

On 2017/01/20 4:18, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote wrote:
>> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch
>>
>> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
>> it's possible for a different TupleTableSlot to be used for partitioned
>> tables at successively lower levels.  If we do end up changing the slot
>> from the original, we must update ecxt_scantuple to point to the new one
>> for partition key of the tuple to be computed correctly.
>>
>> Last posted here:
>> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp
> 
> Why does FormPartitionKeyDatum need this?  Could that be documented in
> a comment in here someplace, perhaps the header comment to
> FormPartitionKeyDatum?

FormPartitionKeyDatum() computes partition key expressions (if any) and
the expression evaluation machinery expects ecxt_scantuple to point to the
tuple to extract attribute values from.

FormPartitionKeyDatum() already has a tiny comment, which it seems is the
only thing we could say here about this there:

* the ecxt_scantuple slot of estate's per-tuple expr context must point to
* the heap tuple passed in.

In get_partition_for_tuple() which calls FormPartitionKeyDatum(), the
patch adds this comment (changed it a little from the last version):

+ /*
+  * Extract partition key from tuple. Expression evaluation machinery
+  * that FormPartitionKeyDatum() invokes expects ecxt_scantuple to
+  * point to the correct tuple slot.  The slot might have changed from
+  * what was used for the parent table if the table of the current
+  * partitioning level has different tuple descriptor from the parent.
+  * So update ecxt_scantuple accordingly.
+  */
+ ecxt->ecxt_scantuple = slot;
FormPartitionKeyDatum(parent, slot, estate, values, isnull);

It says why we need to change which slot ecxt_scantuple points to.

>> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch
>>
>> Automatically updatable views failed to handle partitioned tables.
>> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
>> the WCO expressions having been suitably converted for each partition
>> (think applying map_partition_varattnos to Vars in the WCO expressions
>> just like with partition constraint expressions).
> 
> The changes to execMain.c contain a hunk which has essentially the
> same code twice.  That looks like a mistake.  Also, the patch doesn't
> compile because convert_tuples_by_name() takes 3 arguments, not 4.

Actually, I realized that and sent the updated version [1] of this patch
that fixed this issue.  In the updated version, I removed that code block
(the 2 copies of it), because we are still discussing what to do about
showing tuples in constraint violation (in this case, WITH CHECK OPTION
violation) messages.  Anyway, attached here again.

>> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
>>
>> Currently, the tuple conversion is performed after a tuple is routed,
>> even if the attributes of a target leaf partition map one-to-one with
>> those of the root table, which is wasteful.  Avoid that by making
>> convert_tuples_by_name() return a NULL map for such cases.
> 
> +Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid);
> 
> I think you mean Assert(consider_typeid || indesc->tdhasoid ==
> outdesc->tdhasoid);

Ah, you're right.

> But I wonder why we don't instead just change this function to
> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
> comparing the type OIDs is to find out whether the table-has-OIDs
> setting matches, we could instead test that directly and avoid needing
> to pass an extra argument.  I wonder if there's some other reason this
> code is there which is not documented in the comment...

With the following patch, regression tests run fine:

  if (indesc->natts == outdesc->natts &&
- indesc->tdtypeid == outdesc->tdtypeid)
+ indesc->tdhasoid != outdesc->tdhasoid)
 {

If checking tdtypeid (instead of tdhasoid directly) has some other
consideration, I'd would have seen at least some tests broken by this
change.  So, if we are to go with this, I too prefer it over my previous
proposal to add an argument to convert_tuples_by_name().  Attached 0003
implements the above approach.

> Phew.  Thanks for all the patches, sorry I'm having trouble keeping up.

Thanks a lot for taking time to look through them and committing.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/92fe2a71-5eb7-ee8d-53ef-cfd5a65dfc3d%40lab.ntt.co.jp
>From c7088290221a9fe0818139145b7bdf6731bc419a Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
fro

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote
 wrote:
> 0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch
>
> Since implicit partition constraints are not inherited, an internal
> partition's constraint was not being enforced when targeted directly.
> So, include such constraint when setting up leaf partition result
> relations for tuple-routing.

Committed.

> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch
>
> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
> it's possible for a different TupleTableSlot to be used for partitioned
> tables at successively lower levels.  If we do end up changing the slot
> from the original, we must update ecxt_scantuple to point to the new one
> for partition key of the tuple to be computed correctly.
>
> Last posted here:
> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp

Why does FormPartitionKeyDatum need this?  Could that be documented in
a comment in here someplace, perhaps the header comment to
FormPartitionKeyDatum?

> 0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch
>
> In ExecInsert(), do not switch back to the root partitioned table
> ResultRelInfo until after we finish ExecProcessReturning(), so that
> RETURNING projection is done using the partition's descriptor.  For
> the projection to work correctly, we must initialize the same for
> each leaf partition during ModifyTableState initialization.

Committed.

> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch
>
> Automatically updatable views failed to handle partitioned tables.
> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
> the WCO expressions having been suitably converted for each partition
> (think applying map_partition_varattnos to Vars in the WCO expressions
> just like with partition constraint expressions).

The changes to execMain.c contain a hunk which has essentially the
same code twice.  That looks like a mistake.  Also, the patch doesn't
compile because convert_tuples_by_name() takes 3 arguments, not 4.

> 0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch
>
> Because a given range bound in the PartitionBoundInfo.datums array
> is sometimes a range lower bound and upper bound at other times, we
> must be careful when assuming which, especially when interpreting
> the result of partition_bound_bsearch which returns the index of the
> greatest bound that is less than or equal to probe.  Due to an error
> in thinking about the same, the relevant code in
> check_new_partition_bound() caused invalid partition (index = -1)
> to be chosen as the partition being overlapped.
>
> Last posted here:
> https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp

 }
+/*
+ * If equal has been set to true or if there is no "gap"
+ * between the bound at off1 and that at off1 + 1, the new
+ * partition will overlap some partition.  In the former
+ * case, the new lower bound is found to be equal to the
+ * bound at off1, which could only ever be true if the
+ * latter is the lower bound of some partition.  It's
+ * clear in such a case that the new partition overlaps
+ * that partition, whose index we get using its upper
+ * bound (that is, using the bound at off1 + 1).
+ */
 else

Stylistically, we usually avoid this, or at least I do.  The comment
should go inside the "else" block.  But it looks OK apart from that,
so committed with a little rephrasing and reformatting of the comment.

> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
>
> Currently, the tuple conversion is performed after a tuple is routed,
> even if the attributes of a target leaf partition map one-to-one with
> those of the root table, which is wasteful.  Avoid that by making
> convert_tuples_by_name() return a NULL map for such cases.

+Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid);

I think you mean Assert(consider_typeid || indesc->tdhasoid ==
outdesc->tdhasoid);

But I wonder why we don't instead just change this function to
consider tdhasoid rather than tdtypeid.  I mean, if the only point of
comparing the type OIDs is to find out whether the table-has-OIDs
setting matches, we could instead test that directly and avoid needing
to pass an extra argument.  I wonder if there's some other reason this
code is there which is not documented in the comment...

> 0007-Avoid-code-duplication-in-map_partition_varattnos.patch
>
> Code to map attribute numbers in map_partition_varattnos() duplicates
> what convert_tuples_by_name_map() does.  Avoid that as pointed out by
> Álvaro Herrera.
>
> Last posted here:
> https://www.postgresql.org/message-id/9ce97382-54c8-deb3-9ee9-a

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

On 2017/01/19 14:15, Amit Langote wrote:
> So, here are all the patches I posted to date (and one new at the bottom)
> for reported and unreported bugs, excluding the one involving
> BulkInsertState for which you replied in a new thread.
> 
> I'll describe the attached patches in brief:

Sorry, I forgot to mention that I have skipped the patch I proposed to
modify the committed approach [1] to get the correct tuple to show in the
constraint violation messages.  It might be better to continue that
discussion at [2].

And because I skipped that patch, I should have removed the related logic
in ExecWithCheckOptions() that I added in one of the later patches (patch
0004), which I forgot to do.  So, here are all the patches again with the
correct 0004 this time.  Sigh.

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c771ea74f42447dccaed42ffcdcccf3aa694

[2]
https://www.postgresql.org/message-id/CA%2BTgmoZjGzSM5WwnyapFaw3GxnDLWh7pm8Xiz8_QWQnUQy%3DSCA%40mail.gmail.com
>From 37c861d5eae1c8f11b3fae93cb209da262a13fbc Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 13 Dec 2016 15:07:41 +0900
Subject: [PATCH 1/8] Fix a bug of insertion into an internal partition.

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c  |  1 -
 src/backend/commands/tablecmds.c |  1 -
 src/backend/executor/execMain.c  | 42 
 src/include/executor/executor.h  |  1 -
 src/test/regress/expected/insert.out |  6 ++
 src/test/regress/sql/insert.sql  |  5 +
 6 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1fd2162794..75386212e0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2432,7 +2432,6 @@ CopyFrom(CopyState cstate)
 	InitResultRelInfo(resultRelInfo,
 	  cstate->rel,
 	  1,		/* dummy rangetable index */
-	  true,		/* do load partition check expression */
 	  NULL,
 	  0);
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e633a50dd2..06e43cbb3a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1324,7 +1324,6 @@ ExecuteTruncate(TruncateStmt *stmt)
 		InitResultRelInfo(resultRelInfo,
 		  rel,
 		  0,	/* dummy rangetable index */
-		  false,
 		  NULL,
 		  0);
 		resultRelInfo++;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff277d300a..5457f8fbde 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -824,10 +824,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 			resultRelationOid = getrelid(resultRelationIndex, rangeTable);
 			resultRelation = heap_open(resultRelationOid, RowExclusiveLock);
+
 			InitResultRelInfo(resultRelInfo,
 			  resultRelation,
 			  resultRelationIndex,
-			  true,
 			  NULL,
 			  estate->es_instrument);
 			resultRelInfo++;
@@ -1218,10 +1218,11 @@ void
 InitResultRelInfo(ResultRelInfo *resultRelInfo,
   Relation resultRelationDesc,
   Index resultRelationIndex,
-  bool load_partition_check,
   Relation partition_root,
   int instrument_options)
 {
+	List   *partition_check = NIL;
+
 	MemSet(resultRelInfo, 0, sizeof(ResultRelInfo));
 	resultRelInfo->type = T_ResultRelInfo;
 	resultRelInfo->ri_RangeTableIndex = resultRelationIndex;
@@ -1257,13 +1258,38 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
 	resultRelInfo->ri_projectReturning = NULL;
-	if (load_partition_check)
-		resultRelInfo->ri_PartitionCheck =
-			RelationGetPartitionQual(resultRelationDesc);
+
 	/*
-	 * The following gets set to NULL unless we are initializing leaf
-	 * partitions for tuple-routing.
+	 * If partition_root has been specified, that means we are builiding the
+	 * ResultRelationInfo for one of its leaf partitions.  In that case, we
+	 * need *not* initialize the leaf partition's constraint, but rather the
+	 * the partition_root's (if any).  We must do that explicitly like this,
+	 * because implicit partition constraints are not inherited like user-
+	 * defined constraints and would fail to be enforced by ExecConstraints()
+	 * after a tuple is routed to a leaf partition.
 	 */
+	if (partition_root)
+	{
+		/*
+		 * Root table itself may or may not be a partition; partition_check
+		 * would be NIL in the latter case.
+		 */
+		partition_check = RelationGetPartitionQual(partition_root);
+
+		/*
+		 * This is not our own partition constraint, but rather an ancestor's.
+		 * So any Vars in it bear the ancestor's attribute numbers.  We must
+		 * sw

Re: [HACKERS] Declarative partitioning - another take

2017-01-18 Thread Amit Langote

On 2017/01/19 5:29, Robert Haas wrote:
> On Wed, Jan 18, 2017 at 3:12 PM, Robert Haas  wrote:
>> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
>>  wrote:
>>> [ updated patches ]
>>
>> I committed 0004 and also fixed the related regression test not to
>> rely on DROP .. CASCADE, which isn't always stable.  The remainder of

Thanks!

>> this patch set needs a rebase, and perhaps you could also fold in
>> other pending partitioning fixes so I have everything to look at it in
>> one place.
> 
> Just to be a little more clear, I don't mind multiple threads each
> with a patch or patch set so much, but multiple patch sets on the same
> thread gets hard for me to track.  Sorry for the inconvenience.

OK, I agree that having multiple patch sets on the same thread is cumbersome.

So, here are all the patches I posted to date (and one new at the bottom)
for reported and unreported bugs, excluding the one involving
BulkInsertState for which you replied in a new thread.

I'll describe the attached patches in brief:

0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Last posted here:
https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp

0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch

In ExecInsert(), do not switch back to the root partitioned table
ResultRelInfo until after we finish ExecProcessReturning(), so that
RETURNING projection is done using the partition's descriptor.  For
the projection to work correctly, we must initialize the same for
each leaf partition during ModifyTableState initialization.

0004-Fix-some-issues-with-views-and-partitioned-tables.patch

Automatically updatable views failed to handle partitioned tables.
Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
the WCO expressions having been suitably converted for each partition
(think applying map_partition_varattnos to Vars in the WCO expressions
just like with partition constraint expressions).

0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch

Because a given range bound in the PartitionBoundInfo.datums array
is sometimes a range lower bound and upper bound at other times, we
must be careful when assuming which, especially when interpreting
the result of partition_bound_bsearch which returns the index of the
greatest bound that is less than or equal to probe.  Due to an error
in thinking about the same, the relevant code in
check_new_partition_bound() caused invalid partition (index = -1)
to be chosen as the partition being overlapped.

Last posted here:
https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp

0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch

Currently, the tuple conversion is performed after a tuple is routed,
even if the attributes of a target leaf partition map one-to-one with
those of the root table, which is wasteful.  Avoid that by making
convert_tuples_by_name() return a NULL map for such cases.

0007-Avoid-code-duplication-in-map_partition_varattnos.patch

Code to map attribute numbers in map_partition_varattnos() duplicates
what convert_tuples_by_name_map() does.  Avoid that as pointed out by
Álvaro Herrera.

Last posted here:
https://www.postgresql.org/message-id/9ce97382-54c8-deb3-9ee9-a2ec271d866b%40lab.ntt.co.jp

0008-Avoid-DROP-TABLE-.-CASCADE-in-more-partitioning-test.patch

This is the new one.  There were quite a few commits recently to fix the
breakage in regression tests due to not using ORDER BY in queries on
system catalogs and using DROP TABLE ... CASCADE.  There were still some
instances of the latter in create_table.sql and alter_table.sql.

Thanks,
Amit
>From 37c861d5eae1c8f11b3fae93cb209da262a13fbc Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 13 Dec 2016 15:07:41 +0900
Subject: [PATCH 1/8] Fix a bug of insertion into an internal partition.

Since implicit partition constraints are not inherited, an internal
partition's constraint was not being enforced when targeted directly.
So, include such constraint when setting up leaf partition result
relations for tuple-routing.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/commands/copy.c  |  1 -
 src/backend/commands/tablecmds.c |  1 -
 src/backend/executor/execMain.c  | 42 
 src/include/executor/executor.h

Re: [HACKERS] Declarative partitioning - another take

2017-01-18 Thread Robert Haas

On Wed, Jan 18, 2017 at 3:12 PM, Robert Haas  wrote:
> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
>  wrote:
>> [ updated patches ]
>
> I committed 0004 and also fixed the related regression test not to
> rely on DROP .. CASCADE, which isn't always stable.  The remainder of
> this patch set needs a rebase, and perhaps you could also fold in
> other pending partitioning fixes so I have everything to look at it in
> one place.

Just to be a little more clear, I don't mind multiple threads each
with a patch or patch set so much, but multiple patch sets on the same
thread gets hard for me to track.  Sorry for the inconvenience.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. BulkInsertState

2017-01-18 Thread Robert Haas

On Wed, Jan 11, 2017 at 10:53 PM, Amit Langote
 wrote:
> On 2017/01/06 20:23, Amit Langote wrote:
>> On 2017/01/05 3:26, Robert Haas wrote:
>>> It's unclear to me why we need to do 0002.  It doesn't seem like it
>>> should be necessary, it doesn't seem like a good idea, and the commit
>>> message you proposed is uninformative.
>>
>> If a single BulkInsertState object is passed to
>> heap_insert()/heap_multi_insert() for different heaps corresponding to
>> different partitions (from one input tuple to next), tuples might end up
>> going into wrong heaps (like demonstrated in one of the reports [1]).  A
>> simple solution is to disable bulk-insert in case of partitioned tables.
>>
>> But my patch (or its motivations) was slightly wrongheaded, wherein I
>> conflated multi-insert stuff and bulk-insert considerations.  I revised
>> 0002 to not do that.
>
> Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
> Attaching updated 0002 along with rebased 0001 and 0003.

The BulkInsertState is not there only to improve performance.  It's
also there to make sure we use a BufferAccessStrategy, so that we
don't trash the whole buffer arena.  See commit
85e2cedf985bfecaf43a18ca17433070f439fb0e.  If a partitioned table uses
a separate BulkInsertState for each partition, I believe it will also
end up using a separate ring of buffers for every partition.  That may
well be faster than copying into an unpartitioned table in some cases,
because dirtying everything in the buffer arena without actually
writing any of those buffers is a lot faster than actually doing the
writes.  But it is also anti-social behavior; we have
BufferAccessStrategy objects for a reason.

One idea would be to have each partition use a separate
BulkInsertState but have them point to the same underlying
BufferAccessStrategy, but even that's problematic, because it could
result in us holding a gigantic number of pins (one per partition). I
think maybe a better idea would be to add an additional function
ReleaseBulkInsertStatePin() which gets called whenever we switch
relations, and then just use the same BulkInsertState throughout.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-18 Thread Robert Haas

On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
 wrote:
> [ updated patches ]

I committed 0004 and also fixed the related regression test not to
rely on DROP .. CASCADE, which isn't always stable.  The remainder of
this patch set needs a rebase, and perhaps you could also fold in
other pending partitioning fixes so I have everything to look at it in
one place.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-18 Thread Robert Haas

On Tue, Jan 10, 2017 at 4:17 AM, Amit Langote
 wrote:
> On 2017/01/10 14:44, Keith Fiske wrote:
>> Is there any reason for the exclusion of parent tables from the pg_tables
>> system catalog view? They do not show up in information_schema.tables as
>> well. I believe I found where to make the changes and I tested to make sure
>> it works for my simple case. Attached is my first attempt at patching
>> anything in core. Not sure if there's anywhere else this would need to be
>> fixed.
>
> That's an oversight.  The original partitioning patch didn't touch
> information_schema.sql and system_views.sql at all.  I added the relkind =
> 'P' check in some other views as well, including what your patch considered.

I took a look at this patch:

* The SQL for pg_seclabels contained an obvious syntax error.

* pg_seclabels should only return object types that could be used in
the SECURITY LABEL command, so it needs to return 'table' not
'partitioned table' for the new relkind.

* There's a pointless change from v.relkind = 'v' to v.relkind IN ('v').

* I don't see any indication on the Internet that "PARTITIONED TABLE"
is a legal value for the table type in information_schema.tables.
Unless we can find something official, I suppose we should just
display BASE TABLE in that case as we do in other cases.  I wonder if
the schema needs some broader revision; for example, are there
information_schema elements intended to show information about
partitions?

* Even though unique/primary key/foreign key constraints can't
currently be defined on partitioned tables, it seems best to change
the related reference symmetrically with the others, because we might
support it in the future and then this would break again.  Similarly
for key_column_usage.

Committed with fixes for those issues.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-17 Thread Robert Haas

On Mon, Jan 16, 2017 at 4:09 AM, Amit Langote
 wrote:
> The problem is that whereas the SlotValueDescription that we build to show
> in the error message should be based on the tuple that was passed to
> ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to
> process, it might fail to be the case if the tuple was needed to be
> converted after tuple routing.  slot (the tuple in it and its tuple
> descriptor) and resultRelInfo that ExecConstraint() receives *do*
> correspond with each other, even after possible tuple conversion following
> tuple-routing, and hence constraint checking itself works fine (since
> commit 2ac3ef7a01 [1]).  As said, it's the val_desc built to show in the
> error message being based on the converted-for-partition tuple that could
> be seen as a problem - is it acceptable if we showed in the error message
> whatever the converted-for-partition tuple looks like which might have
> columns ordered differently from the root table?  If so, we could simply
> forget the whole thing, including reverting f1b4c771 [2].
>
> An example:
>
> create table p (a int, b char, c int) partition by list (a);
> create table p1 (b char, c int, a int);-- note the column order
> alter table p attach partition p1 for values in (1);
> alter table p add constraint check_b check (b = 'x');
>
> insert into p values (1, 'y', 1);
> ERROR:  new row for relation "p1" violates check constraint "check_b"
> DETAIL:  Failing row contains (y, 1, 1).
>
> Note that "(y, 1, 1)" results from using p1's descriptor on the converted
> tuple.  As long that's clear and acceptable, I think we need not worry
> about this patch and revert the previously committed patch for this "problem".

Hmm.  It would be fine, IMHO, if the detail message looked like the
one that BuildIndexValueDescription produces.  Without the column
names, the clarity is somewhat lessened.

Anybody else have an opinion on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-16 Thread Amit Langote

On 2017/01/14 6:24, Robert Haas wrote:
> On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote wrote:
>>
>> Thanks!  I realized however that the approach I used in 0002 of passing
>> the original slot to ExecConstraints() fails in certain situations.  For
>> example, if a BR trigger changes the tuple, the original slot would not
>> receive those changes, so it will be wrong to use such a tuple anymore.
>> In attached 0001, I switched back to the approach of converting the
>> partition-tupdesc-based tuple back to the root partitioned table's format.
>>  The converted tuple contains the changes by BR triggers, if any.  Sorry
>> about some unnecessary work.
> 
> Hmm.  Even with this patch, I wonder if this is really correct.  I
> mean, isn't the root of the problem here that ExecConstraints() is
> expecting that resultRelInfo matches slot, and it doesn't?

The problem is that whereas the SlotValueDescription that we build to show
in the error message should be based on the tuple that was passed to
ExecInsert() or whatever NextCopyFrom() returned for CopyFrom() to
process, it might fail to be the case if the tuple was needed to be
converted after tuple routing.  slot (the tuple in it and its tuple
descriptor) and resultRelInfo that ExecConstraint() receives *do*
correspond with each other, even after possible tuple conversion following
tuple-routing, and hence constraint checking itself works fine (since
commit 2ac3ef7a01 [1]).  As said, it's the val_desc built to show in the
error message being based on the converted-for-partition tuple that could
be seen as a problem - is it acceptable if we showed in the error message
whatever the converted-for-partition tuple looks like which might have
columns ordered differently from the root table?  If so, we could simply
forget the whole thing, including reverting f1b4c771 [2].

An example:

create table p (a int, b char, c int) partition by list (a);
create table p1 (b char, c int, a int);-- note the column order
alter table p attach partition p1 for values in (1);
alter table p add constraint check_b check (b = 'x');

insert into p values (1, 'y', 1);
ERROR:  new row for relation "p1" violates check constraint "check_b"
DETAIL:  Failing row contains (y, 1, 1).

Note that "(y, 1, 1)" results from using p1's descriptor on the converted
tuple.  As long that's clear and acceptable, I think we need not worry
about this patch and revert the previously committed patch for this "problem".

> And why
> isn't that also a problem for the things that get passed resultRelInfo
> and slot after tuple routing and before ExecConstraints?  In
> particular, in copy.c, ExecBRInsertTriggers.

If I explained the problem (as I'm seeing it) well enough above, you may
see why that's not an issue in other cases.  Other sub-routines viz.
ExecBRInsertTriggers, ExecWithCheckOptions for RLS INSERT WITH CHECK
policies, ExecARInsertTriggers, etc. do receive the correct slot and
resultRelInfo for whatever they do with a given tuple and the relation
(partition) it is being inserted into.  However, if we do consider the
above to be a problem, then we would need to fix ExecWithCheckOptions() to
do whatever we decide ExecConstraints() should do, because it can also
report WITH CHECK violation for a tuple.

> Committed 0002.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ac3ef7
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1b4c77

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-13 Thread Robert Haas

On Tue, Jan 10, 2017 at 6:06 AM, Amit Langote
 wrote:
> On 2017/01/05 5:50, Robert Haas wrote:
>> On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote
>>  wrote:
>>> Patches 0001 to 0006 unchanged.
>>
>> Committed 0001 earlier, as mentioned in a separate email.  Committed
>> 0002 and part of 0003.
>
> Thanks!  I realized however that the approach I used in 0002 of passing
> the original slot to ExecConstraints() fails in certain situations.  For
> example, if a BR trigger changes the tuple, the original slot would not
> receive those changes, so it will be wrong to use such a tuple anymore.
> In attached 0001, I switched back to the approach of converting the
> partition-tupdesc-based tuple back to the root partitioned table's format.
>  The converted tuple contains the changes by BR triggers, if any.  Sorry
> about some unnecessary work.

Hmm.  Even with this patch, I wonder if this is really correct.  I
mean, isn't the root of the problem here that ExecConstraints() is
expecting that resultRelInfo matches slot, and it doesn't? And why
isn't that also a problem for the things that get passed resultRelInfo
and slot after tuple routing and before ExecConstraints?  In
particular, in copy.c, ExecBRInsertTriggers.

Committed 0002.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-11 Thread Amit Langote

On 2017/01/06 20:23, Amit Langote wrote:
> On 2017/01/05 3:26, Robert Haas wrote:
>> It's unclear to me why we need to do 0002.  It doesn't seem like it
>> should be necessary, it doesn't seem like a good idea, and the commit
>> message you proposed is uninformative.
>
> If a single BulkInsertState object is passed to
> heap_insert()/heap_multi_insert() for different heaps corresponding to
> different partitions (from one input tuple to next), tuples might end up
> going into wrong heaps (like demonstrated in one of the reports [1]).  A
> simple solution is to disable bulk-insert in case of partitioned tables.
>
> But my patch (or its motivations) was slightly wrongheaded, wherein I
> conflated multi-insert stuff and bulk-insert considerations.  I revised
> 0002 to not do that.

Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
Attaching updated 0002 along with rebased 0001 and 0003.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/732dfc84-25f5-413c-1eee-0bfa7a370093%40agama.tv
>From f2a64348021c7dba1f96d0c8b4e3e253f635b019 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Also update the regression tests so that the code manipulating
ecxt_scantuple is covered.

Reported by: Rajkumar Raghuwanshi
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
---
 src/backend/catalog/partition.c  | 29 ++---
 src/backend/executor/execMain.c  |  2 --
 src/test/regress/expected/insert.out |  2 +-
 src/test/regress/sql/insert.sql  |  2 +-
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f54e1bdf3f..0de1cf245a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1643,7 +1643,10 @@ get_partition_for_tuple(PartitionDispatch *pd,
 	bool		isnull[PARTITION_MAX_KEYS];
 	int			cur_offset,
 cur_index;
-	int			i;
+	int			i,
+result;
+	ExprContext *ecxt = GetPerTupleExprContext(estate);
+	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
 	/* start with the root partitioned table */
 	parent = pd[0];
@@ -1672,7 +1675,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			slot = myslot;
 		}
 
-		/* Extract partition key from tuple */
+		/*
+		 * Extract partition key from tuple; FormPartitionKeyDatum() expects
+		 * ecxt_scantuple to point to the correct tuple slot (which might be
+		 * different from the slot we received from the caller if the
+		 * partitioned table of the current level has different tuple
+		 * descriptor from its parent).
+		 */
+		ecxt->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);
 
 		if (key->strategy == PARTITION_STRATEGY_RANGE)
@@ -1727,16 +1737,21 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		 */
 		if (cur_index < 0)
 		{
+			result = -1;
 			*failed_at = RelationGetRelid(parent->reldesc);
-			return -1;
+			break;
 		}
-		else if (parent->indexes[cur_index] < 0)
-			parent = pd[-parent->indexes[cur_index]];
-		else
+		else if (parent->indexes[cur_index] >= 0)
+		{
+			result = parent->indexes[cur_index];
 			break;
+		}
+		else
+			parent = pd[-parent->indexes[cur_index]];
 	}
 
-	return parent->indexes[cur_index];
+	ecxt->ecxt_scantuple = ecxt_scantuple_old;
+	return result;
 }
 
 /*
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff277d300a..6a9bc8372f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3167,9 +3167,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 {
 	int		result;
 	Oid		failed_at;
-	ExprContext *econtext = GetPerTupleExprContext(estate);
 
-	econtext->ecxt_scantuple = slot;
 	result = get_partition_for_tuple(pd, slot, estate, &failed_at);
 	if (result < 0)
 	{
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index ca3134c34c..1c7b8047ee 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -302,7 +302,7 @@ drop cascades to table part_ee_ff1
 drop cascades to table part_ee_ff2
 -- more tests for certain multi-level partitioning scenarios
 create table p (a int, b int) partition by range (a, b);
-create table p1 (b int, a int not null) partition by range (b);
+create table p1 (b int not null, a int not null) partition by range ((b+0));
 create table p11 (like p1);
 alter table p11 drop a;
 alter table p11 add a int;
diff --git a/src

Re: [HACKERS] Declarative partitioning - another take

2017-01-10 Thread Amit Langote

On 2017/01/05 5:50, Robert Haas wrote:
> On Tue, Dec 27, 2016 at 3:59 AM, Amit Langote
>  wrote:
>> Patches 0001 to 0006 unchanged.
>
> Committed 0001 earlier, as mentioned in a separate email.  Committed
> 0002 and part of 0003.

Thanks!  I realized however that the approach I used in 0002 of passing
the original slot to ExecConstraints() fails in certain situations.  For
example, if a BR trigger changes the tuple, the original slot would not
receive those changes, so it will be wrong to use such a tuple anymore.
In attached 0001, I switched back to the approach of converting the
partition-tupdesc-based tuple back to the root partitioned table's format.
 The converted tuple contains the changes by BR triggers, if any.  Sorry
about some unnecessary work.

> But I'm skeptical that the as-patched-by-0003
> logic in generate_partition_qual() makes sense.  You do this:
>
> result = list_concat(generate_partition_qual(parent),
>  copyObject(rel->rd_partcheck));
>
> /* Mark Vars with correct attnos */
> result = map_partition_varattnos(result, rel, parent);
>
> But that has the effect of applying map_partition_varattnos to
> everything in rel->rd_partcheck in addition to applying it to
> everything returned by generate_partition_qual() on the parent, which
> doesn't seem right.

I've replaced this portion of the code with (as also mentioned below):

/* Quick copy */
if (rel->rd_partcheck != NIL)
return copyObject(rel->rd_partcheck);

Down below (for the case when the partition qual is not cached, we now do
this:

my_qual = get_qual_from_partbound(rel, parent, bound);

/* Add the parent's quals to the list (if any) */
if (parent->rd_rel->relispartition)
result = list_concat(generate_partition_qual(parent), my_qual);
else
result = my_qual;

/*
 * Change Vars to have partition's attnos instead of the parent's.
 * We do this after we concatenate the parent's quals, because
 * we want every Var in it to bear this relation's attnos.
 */
result = map_partition_varattnos(result, rel, parent);

Which is then cached wholly in rd_partcheck.

As for your concern whether it's correct to do so, consider that doing
generate_partition_qual() on the parent returns qual with Vars that bear
the parent's attnos (which is OK as far parent as partition is concerned).
 To apply the qual to the current relation as partition, we must change
the Vars to have this relation's attnos.

> Also, don't we want to do map_partition_varattnos() just ONCE, rather
> than on every call to this function?  I think maybe your concern is
> that the parent might be changed without a relcache flush on the
> child, but I don't quite see how that could happen.  If the parent's
> tuple descriptor changes, surely the child's tuple descriptor would
> have to be altered at the same time...

Makes sense.  I fixed so that we return copyObject(rel->rd_partcheck), if
it's non-NIL, instead of generating parent's qual and doing the mapping
again.  For some reason, I thought we couldn't save the mapped version in
the relcache.

By the way, in addition to the previously mentioned bug of RETURNING, I
found that WITH CHECK OPTION didn't work correctly as well.  In fact
automatically updatable views failed to consider partitioned tables at
all.  Patch 0007 is addressed towards fixing that.

Thanks,
Amit
>From e408234633c01817d6a2313fdbdccdb4f0057c1e Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 6 Jan 2017 15:53:10 +0900
Subject: [PATCH 1/7] Fix reporting of violation in ExecConstraints, again

We decided in f1b4c771ea74f42447dccaed42ffcdcccf3aa694 that passing
the original slot (one containing the tuple formatted per root
partitioned table's tupdesc) to ExecConstraints(), but that breaks
certain cases.  Imagine what would happen if a BR trigger changed the
tuple - the original slot would not contain those changes.
So, it seems better to convert (if necessary) the tuple formatted
per partition tupdesc after tuple-routing back to the root table's
format and use the converted tuple to make val_desc shown in the
message if an error occurs.
---
 src/backend/commands/copy.c|  6 ++--
 src/backend/executor/execMain.c| 53 +-
 src/backend/executor/nodeModifyTable.c |  5 ++--
 src/include/executor/executor.h|  3 +-
 src/test/regress/expected/insert.out   | 18 ++--
 src/test/regress/sql/insert.sql| 17 ++-
 6 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f56b2ac49b..65eb167087 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2491,8 +2491,7 @@ CopyFrom(CopyState cstate)

 	for (;;)
 	{
-		TupleTableSlot *slot,
-	   *oldslot;
+		TupleTableSlot *slot;
 		bool		skip_tuple;
 		Oid			loaded_oid = InvalidOid;

@@ -2534,7 +2533,6 @@ CopyFrom(CopyState cstate)
 		ExecStore

Re: [HACKERS] Declarative partitioning - another take

2017-01-10 Thread Amit Langote


Hi Kieth,

On 2017/01/10 14:44, Keith Fiske wrote:
> Is there any reason for the exclusion of parent tables from the pg_tables
> system catalog view? They do not show up in information_schema.tables as
> well. I believe I found where to make the changes and I tested to make sure
> it works for my simple case. Attached is my first attempt at patching
> anything in core. Not sure if there's anywhere else this would need to be
> fixed.

That's an oversight.  The original partitioning patch didn't touch
information_schema.sql and system_views.sql at all.  I added the relkind =
'P' check in some other views as well, including what your patch considered.

Thanks,
Amit
>From 9eef3e87b9a025d233aa4b935b50bb0c7633efbb Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Jan 2017 18:12:25 +0900
Subject: [PATCH] information_schema/system_views.sql and relkind 'P'

Currently, partitioned table are not taken into account in various
information_schema and system views.

Reported by: Keith Fiske
Patch by: Kieth Fiske, Amit Langote
Reports: https://www.postgresql.org/message-id/CAG1_KcDJiZB=l6youo_bvufj2q2851_xdkfhw0jdcd_2vtk...@mail.gmail.com
---
 src/backend/catalog/information_schema.sql | 37 +++---
 src/backend/catalog/system_views.sql   |  3 ++-
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 4df390a763..318f195b81 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -453,7 +453,7 @@ CREATE VIEW check_constraints AS
   AND a.attnum > 0
   AND NOT a.attisdropped
   AND a.attnotnull
-  AND r.relkind = 'r'
+  AND r.relkind IN ('r', 'P')
   AND pg_has_role(r.relowner, 'USAGE');
 
 GRANT SELECT ON check_constraints TO PUBLIC;
@@ -525,7 +525,7 @@ CREATE VIEW column_domain_usage AS
   AND a.attrelid = c.oid
   AND a.atttypid = t.oid
   AND t.typtype = 'd'
-  AND c.relkind IN ('r', 'v', 'f')
+  AND c.relkind IN ('r', 'v', 'f', 'P')
   AND a.attnum > 0
   AND NOT a.attisdropped
   AND pg_has_role(t.typowner, 'USAGE');
@@ -564,7 +564,7 @@ CREATE VIEW column_privileges AS
   pr_c.relowner
FROM (SELECT oid, relname, relnamespace, relowner, (aclexplode(coalesce(relacl, acldefault('r', relowner.*
  FROM pg_class
- WHERE relkind IN ('r', 'v', 'f')
+ WHERE relkind IN ('r', 'v', 'f', 'P')
 ) pr_c (oid, relname, relnamespace, relowner, grantor, grantee, prtype, grantable),
 pg_attribute a
WHERE a.attrelid = pr_c.oid
@@ -586,7 +586,7 @@ CREATE VIEW column_privileges AS
 ) pr_a (attrelid, attname, grantor, grantee, prtype, grantable),
 pg_class c
WHERE pr_a.attrelid = c.oid
- AND relkind IN ('r', 'v', 'f')
+ AND relkind IN ('r', 'v', 'f', 'P')
  ) x,
  pg_namespace nc,
  pg_authid u_grantor,
@@ -629,7 +629,7 @@ CREATE VIEW column_udt_usage AS
 WHERE a.attrelid = c.oid
   AND a.atttypid = t.oid
   AND nc.oid = c.relnamespace
-  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f')
+  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f', 'P')
   AND pg_has_role(coalesce(bt.typowner, t.typowner), 'USAGE');
 
 GRANT SELECT ON column_udt_usage TO PUBLIC;
@@ -738,7 +738,7 @@ CREATE VIEW columns AS
CAST('NEVER' AS character_data) AS is_generated,
CAST(null AS character_data) AS generation_expression,
 
-   CAST(CASE WHEN c.relkind = 'r' OR
+   CAST(CASE WHEN c.relkind IN ('r', 'P') OR
   (c.relkind IN ('v', 'f') AND
pg_column_is_updatable(c.oid, a.attnum, false))
 THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_updatable
@@ -753,7 +753,7 @@ CREATE VIEW columns AS
 
 WHERE (NOT pg_is_other_temp_schema(nc.oid))
 
-  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f')
+  AND a.attnum > 0 AND NOT a.attisdropped AND c.relkind in ('r', 'v', 'f', 'P')
 
   AND (pg_has_role(c.relowner, 'USAGE')
OR has_column_privilege(c.oid, a.attnum,
@@ -789,7 +789,7 @@ CREATE VIEW constraint_column_usage AS
 AND d.objid = c.oid
 AND c.connamespace = nc.oid
 AND c.contype = 'c'
-AND r.relkind = 'r'
+AND r.relkind IN ('r', 'P')
 AND NOT a.attisdropped
 
 UNION ALL
@@ -841,7 +841,7 @@ CREATE VIEW constraint_table_usage AS
 WHERE c.connamespace = nc.oid AND r.relnamespace = nr.oid
   AND ( (c.contype = 'f' AND c.confrelid = r.oid)
  OR (c.contype IN ('p', 'u') AND c.conrelid = r.oid) )
-  AND r.relkind = 'r'
+

Re: [HACKERS] Declarative partitioning - another take

2017-01-10 Thread Amit Langote

Hi Amul,

On 2017/01/09 17:29, amul sul wrote:
> I got server crash due to assert failure at ATTACHing overlap rang
> partition, here is test case to reproduce this:
> 
> CREATE TABLE test_parent(a int) PARTITION BY RANGE (a);
> CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES
> FROM(100) TO(200);
> CREATE TABLE test_parent_part1(a int NOT NULL);
> ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES
> FROM(1) TO(200);
> 
> I think, this bug exists in the following code of check_new_partition_bound():
> 
>  767 if (equal || off1 != off2)
>  768 {
>  769 overlap = true;
>  770 with = boundinfo->indexes[off2 + 1];
>  771 }
> 
> When equal is true array index should not be 'off2 + 1'.

Good catch.  Attached patch should fix that.  I observed crash with the
following command as well:

ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES FROM
(1) TO (300);

That's because there is one more case when the array index shouldn't be
off2 + 1 - the case where the bound at off2 is an upper bound (I'd wrongly
assumed that it's always a lower bound).  Anyway, I rewrote the
surrounding comments to clarify the logic a bit.

> While reading code related to this, I wondered why
> partition_bound_bsearch is not immediately returns when cmpval==0?

partition_bound_bsearch() is meant to return the *greatest* index of the
bound less than or equal to the input bound ("probe").  But it seems to me
now that we would always return the first index at which we get 0 for
cmpval, albeit after wasting cycles to try to find even greater index.
Because we don't have duplicates in the datums array, once we encounter a
bound that is equal to probe, we are only going to find bounds that are
*greater than* probe if we continue looking right, only to turn back again
to return the equal index (which is wasted cycles in invoking the
partition key comparison function(s)).  So, it perhaps makes sense to do
this per your suggestion:

@@ -1988,8 +2018,11 @@ partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
 if (cmpval <= 0)
 {
 lo = mid;
 *is_equal = (cmpval == 0);
+
+if (*is_equal)
+break;
 }

Thanks,
Amit
>From 7fe537f8e8efeac51c3b0cc91ac51a1aa39399cd Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Jan 2017 17:43:36 +0900
Subject: [PATCH] Fix some wrong thinking in check_new_partition_bound()

Because a given range bound in the PartitionBoundInfo.datums array
is sometimes a lower range bound and at other times an upper range
bound, we must be careful when assuming which, especially when
interpreting the result of partition_bound_bsearch which returns
the index of the greatest bound that is less than or equal to probe.
Due to an error in thinking about the same, the relevant code in
check_new_partition_bound() caused invalid partition (index == -1)
to be chosen as the partition being overlapped.

Also, we need not continue searching for even greater bound in
partition_bound_bsearch() once we find the first bound that is *equal*
to the probe, because we don't have duplicate datums.  That spends
cycles needlessly.  Per suggestion from Amul Sul.

Reported by: Amul Sul
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAAJ_b94XgbqVoXMyxxs63CaqWoMS1o2gpHiU0F7yGnJBnvDc_A%40mail.gmail.com
---
 src/backend/catalog/partition.c| 62 ++
 src/test/regress/expected/create_table.out | 10 -
 src/test/regress/sql/create_table.sql  |  4 ++
 3 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f54e1bdf3f..df5652de4c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -741,35 +741,64 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
 		   boundinfo->strategy == PARTITION_STRATEGY_RANGE);

 	/*
-	 * Find the greatest index of a range bound that is less
-	 * than or equal with the new lower bound.
+	 * Firstly, find the greatest range bound that is less
+	 * than or equal to the new lower bound.
 	 */
 	off1 = partition_bound_bsearch(key, boundinfo, lower, true,
    &equal);

 	/*
-	 * If equal has been set to true, that means the new lower
-	 * bound is found to be equal with the bound at off1,
-	 * which clearly means an overlap with the partition at
-	 * index off1+1).
-	 *
-	 * Otherwise, check if there is a "gap" that could be
-	 * occupied by the new partition.  In case of a gap, the
-	 * new upper bound should not cross past the upper
-	 * boundary of the gap, that is, off2 == off1 should be
-	 * true.
+	 * off1 == -1 means that all existing bounds are greater
+	 * than the new lower

Re: [HACKERS] Declarative partitioning - another take

2017-01-09 Thread Keith Fiske

Is there any reason for the exclusion of parent tables from the pg_tables
system catalog view? They do not show up in information_schema.tables as
well. I believe I found where to make the changes and I tested to make sure
it works for my simple case. Attached is my first attempt at patching
anything in core. Not sure if there's anywhere else this would need to be
fixed.

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 4df390a..c31d0d8 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1901,6 +1901,7 @@ CREATE VIEW tables AS
   WHEN c.relkind = 'r' THEN 'BASE TABLE'
   WHEN c.relkind = 'v' THEN 'VIEW'
   WHEN c.relkind = 'f' THEN 'FOREIGN TABLE'
+  WHEN c.relkind = 'P' THEN 'PARTITIONED TABLE'
   ELSE null END
  AS character_data) AS table_type,
 
@@ -1912,7 +1913,7 @@ CREATE VIEW tables AS
CAST(t.typname AS sql_identifier) AS user_defined_type_name,
 
CAST(CASE WHEN c.relkind = 'r' OR
-  (c.relkind IN ('v', 'f') AND
+  (c.relkind IN ('v', 'f', 'P') AND
-- 1 << CMD_INSERT
pg_relation_is_updatable(c.oid, false) & 8 = 8)
 THEN 'YES' ELSE 'NO' END AS yes_or_no) AS is_insertable_into,
@@ -1923,7 +1924,7 @@ CREATE VIEW tables AS
 FROM pg_namespace nc JOIN pg_class c ON (nc.oid = c.relnamespace)
LEFT JOIN (pg_type t JOIN pg_namespace nt ON (t.typnamespace = nt.oid)) ON (c.reloftype = t.oid)
 
-WHERE c.relkind IN ('r', 'v', 'f')
+WHERE c.relkind IN ('r', 'v', 'f', 'P')
   AND (NOT pg_is_other_temp_schema(nc.oid))
   AND (pg_has_role(c.relowner, 'USAGE')
OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER')
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 31aade1..f4dc460 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -136,7 +136,7 @@ CREATE VIEW pg_tables AS
 C.relrowsecurity AS rowsecurity
 FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  LEFT JOIN pg_tablespace T ON (T.oid = C.reltablespace)
-WHERE C.relkind = 'r';
+WHERE C.relkind = 'r' OR C.relkind = 'P';
 
 CREATE VIEW pg_matviews AS
 SELECT


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-09 Thread amul sul

Hi,

I got server crash due to assert failure at ATTACHing overlap rang
partition, here is test case to reproduce this:

CREATE TABLE test_parent(a int) PARTITION BY RANGE (a);
CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES
FROM(100) TO(200);
CREATE TABLE test_parent_part1(a int NOT NULL);
ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES
FROM(1) TO(200);

I think, this bug exists in the following code of check_new_partition_bound():

 767 if (equal || off1 != off2)
 768 {
 769 overlap = true;
 770 with = boundinfo->indexes[off2 + 1];
 771 }

When equal is true array index should not be 'off2 + 1'.

While reading code related to this, I wondered why
partition_bound_bsearch is not immediately returns when cmpval==0?

Apologise if this has been already reported.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-06 Thread Amit Langote

On 2017/01/05 3:26, Robert Haas wrote:
> On Tue, Dec 27, 2016 at 8:41 PM, Amit Langote
>  wrote:
>> On 2016/12/27 19:07, Amit Langote wrote:
>>> Attached should fix that.
>>
>> Here are the last two patches with additional information like other
>> patches.  Forgot to do that yesterday.
> 
> 0001 has the disadvantage that get_partition_for_tuple() acquires a
> side effect.  That seems undesirable.  At the least, it needs to be
> documented in the function's header comment.

That's true.  How about we save away the original ecxt_scantuple at entry
and restore the same just before returning from the function?  That way
there would be no side effect.  0001 implements that.

> It's unclear to me why we need to do 0002.  It doesn't seem like it
> should be necessary, it doesn't seem like a good idea, and the commit
> message you proposed is uninformative.

If a single BulkInsertState object is passed to
heap_insert()/heap_multi_insert() for different heaps corresponding to
different partitions (from one input tuple to next), tuples might end up
going into wrong heaps (like demonstrated in one of the reports [1]).  A
simple solution is to disable bulk-insert in case of partitioned tables.

But my patch (or its motivations) was slightly wrongheaded, wherein I
conflated multi-insert stuff and bulk-insert considerations.  I revised
0002 to not do that.

However if we disable bulk-insert mode, COPY's purported performance
benefit compared with INSERT is naught.  Patch 0003 is a proposal to
implement bulk-insert mode even for partitioned tables.  Basically,
allocate separate BulkInsertState objects for each partition and switch to
the appropriate one just before calling heap_insert()/heap_multi_insert().
 Then to be able to use heap_multi_insert(), we must also manage buffered
tuples separately for each partition.  Although, I didn't modify the limit
on number of buffered tuples and/or size of buffered tuples which controls
when we pause buffering and do heap_multi_insert() on buffered tuples.
Maybe, it should work slightly differently for the partitioned table case,
like for example, increase the overall limit on both the number of tuples
and tuple size in the partitioning case (I observed that increasing it 10x
or 100x helped to some degree).  Thoughts on this?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
>From 332c67c258a0f25f76c29ced23199fe0ee8e153e Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
from the original, we must update ecxt_scantuple to point to the new one
for partition key of the tuple to be computed correctly.

Also update the regression tests so that the code manipulating
ecxt_scantuple is covered.

Reported by: Rajkumar Raghuwanshi
Patch by: Amit Langote
Reports: https://www.postgresql.org/message-id/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
---
 src/backend/catalog/partition.c  | 29 ++---
 src/backend/executor/execMain.c  |  2 --
 src/test/regress/expected/insert.out |  2 +-
 src/test/regress/sql/insert.sql  |  2 +-
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f54e1bdf3f..0de1cf245a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1643,7 +1643,10 @@ get_partition_for_tuple(PartitionDispatch *pd,
 	bool		isnull[PARTITION_MAX_KEYS];
 	int			cur_offset,
 cur_index;
-	int			i;
+	int			i,
+result;
+	ExprContext *ecxt = GetPerTupleExprContext(estate);
+	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;

 	/* start with the root partitioned table */
 	parent = pd[0];
@@ -1672,7 +1675,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			slot = myslot;
 		}

-		/* Extract partition key from tuple */
+		/*
+		 * Extract partition key from tuple; FormPartitionKeyDatum() expects
+		 * ecxt_scantuple to point to the correct tuple slot (which might be
+		 * different from the slot we received from the caller if the
+		 * partitioned table of the current level has different tuple
+		 * descriptor from its parent).
+		 */
+		ecxt->ecxt_scantuple = slot;
 		FormPartitionKeyDatum(parent, slot, estate, values, isnull);

 		if (key->strategy == PARTITION_STRATEGY_RANGE)
@@ -1727,16 +1737,21 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		 */
 		if (cur_index < 0)
 		{
+			result = -1;
 			*failed_at = RelationGetRelid(parent->reldesc);
-			return -1;
+			break;
 		}
-		else if (parent->indexes[cur_index] < 0)
-			parent = pd[-parent->indexes[cur_index]];
-		else
+		else if (parent->indexes[cur_index] >= 0)

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Amit Langote


Hi Keith,

On 2017/01/06 2:16, Keith Fiske wrote:
> Could we get some clarification on the partition_bound_spec portion of the
> PARTITION OF clause? Just doing some testing it seems it's inclusive of the
> FROM value but exclusive of the TO value. I don't see mention of this in
> the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday.
> It does mention that the values aren't allowed to overlap, but looking at
> the schema below, without the clarification of which side is
> inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even
> the child table does not clarify this. Not sure if there's a way to do this
> in the \d+ display which would be ideal, but it should at least be
> mentioned in the docs.

I agree that needs highlighting.  I'm planning to write a doc patch for
that (among other documentation improvements).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-05 Thread Keith Fiske

Could we get some clarification on the partition_bound_spec portion of the
PARTITION OF clause? Just doing some testing it seems it's inclusive of the
FROM value but exclusive of the TO value. I don't see mention of this in
the docs as of commit 18fc5192a631441a73e6a3b911ecb14765140389 yesterday.
It does mention that the values aren't allowed to overlap, but looking at
the schema below, without the clarification of which side is
inclusive/exclusive it seems confusing because 2016-08-01 is in both. Even
the child table does not clarify this. Not sure if there's a way to do this
in the \d+ display which would be ideal, but it should at least be
mentioned in the docs.

keith@keith=# \d+ measurement
 Table "public.measurement"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
---+-+---+--+-+-+--+-
 logdate   | date|   | not null | | plain
|  |
 peaktemp  | integer |   |  | 1   | plain
|  |
 unitsales | integer |   |  | | plain
|  |
Partition key: RANGE (logdate)
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)
Partitions: measurement_y2016m07 FOR VALUES FROM ('2016-07-01') TO
('2016-08-01'),
measurement_y2016m08 FOR VALUES FROM ('2016-08-01') TO
('2016-09-01')

keith@keith=# \d+ measurement_y2016m07
 Table "public.measurement_y2016m07"
  Column   |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
---+-+---+--+-+-+--+-
 logdate   | date|   | not null | | plain
|  |
 peaktemp  | integer |   |  | 1   | plain
|  |
 unitsales | integer |   |  | 0   | plain
|  |
Partition of: measurement FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)


keith@keith=# insert into measurement (logdate) values ('2016-08-01');
INSERT 0 1
Time: 2.848 ms

keith@keith=# select * from measurement_y2016m07;
 logdate | peaktemp | unitsales
-+--+---
(0 rows)

Time: 0.273 ms
keith@keith=# select * from measurement_y2016m08;
  logdate   | peaktemp | unitsales
+--+---
 2016-08-01 |1 |«NULL»
(1 row)

Time: 0.272 ms

keith@keith=# drop table measurement_y2016m08;
DROP TABLE
Time: 5.919 ms
keith@keith=# select * from only measurement;
 logdate | peaktemp | unitsales
-+--+---
(0 rows)

Time: 0.307 ms
keith@keith=# insert into measurement (logdate) values ('2016-08-01');
ERROR:  no partition of relation "measurement" found for row
DETAIL:  Failing row contains (2016-08-01, 1, null).
Time: 0.622 ms

1 2 3 4 5 6 >

1 - 100 of 536 matches

Mail list logo