Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-11 Thread Amit Kapila
On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes  wrote:
> On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar  wrote:
>>
>> So because of this high projection cost the seqpath and parallel path
>> both have fuzzily same cost but seqpath is winning because it's
>> parallel safe.
>
>
> I think you are correct.  However, unless parallel_tuple_cost is set very
> low, apply_projection_to_path never gets called with the Gather path as an
> argument.  It gets ruled out at some earlier stage, presumably because it
> assumes the projection step cannot make it win if it is already behind by
> enough.
>

I think that is genuine because tuple communication cost is very high.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))

> So the attached patch improves things, but doesn't go far enough.
>

It seems to that we need to adjust the cost based on if the below node
is projection capable.  See attached.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


subpath_projection_cost.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql 10: hash indexes testing

2017-07-11 Thread Amit Kapila
On Tue, Jul 11, 2017 at 11:08 PM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>
>> Yes, I also think the same idea can be used, in fact, I have mentioned
>> it [1] as soon as you have committed that patch.  Do we want to do
>> anything at this stage for PG-10?  I don't think we should attempt
>> something this late unless people feel this is a show-stopper issue
>> for usage of hash indexes.  If required, I think a separate function
>> can be provided to allow users to perform squeeze operation.
>
> Sorry, I have no idea how critical this squeeze thing is for the
> newfangled hash indexes, so I cannot comment on that.  Does this make
> the indexes unusable in some way under some circumstances?
>

It seems so.  Basically, in the case of a large number of duplicates,
we hit the maximum number of overflow pages.  There is a theoretical
possibility of hitting it but it could also happen that we are not
free the existing unused overflow pages due to which it keeps on
growing and hit the limit.  I have requested up thread to verify if
that is happening in this case and I am still waiting for same.  The
squeeze operation does free such unused overflow pages after cleaning
them.  As this is a costly operation and needs a cleanup lock, so we
currently perform it only during Vacuum and next split from the bucket
which can have redundant overflow pages.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor style cleanup in tab-complete.c

2017-07-11 Thread Michael Paquier
On Wed, Jul 12, 2017 at 11:19 AM, Thomas Munro
 wrote:
> From the triviality department: I noticed some branches in
> tab-complete.c's gargantuan if statement, mostly brand new, that break
> from the established brace style.  Should we fix that like this?

For consistency I think it does.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-11 Thread Amit Langote
On 2017/07/12 12:47, Ashutosh Bapat wrote:
> On Wed, Jul 12, 2017 at 8:23 AM, Amit Langote
>  wrote:
>> On 2017/07/11 18:57, Ashutosh Bapat wrote:
>>> On Tue, Jul 11, 2017 at 4:16 AM, David Fetter  wrote:
 So whatever we land on needs to mention partition_of and
 has_partitions.  Is that latter just its immediate partitions?
 Recursion all the way down?  Somewhere in between?

>>>
>>> We have patches proposed to address some of those concerns at [1]
>>>
>>> [1] 
>>> https://www.postgresql.org/message-id/CAFjFpRcs5fOSfaAGAjT5C6=yvdd7mrx3knf_spb5dqzojgj...@mail.gmail.com
>>
>> ISTM, David is talking about the "list tables" (bare \d without any
>> pattern) case.  That is, listing partitioned tables as of type
>> "partitioned table" instead of "table" as we currently do.  The linked
>> patch, OTOH, is for "describe table" (\d ) case.
> 
> Right, the patches don't exactly do what David is suggesting, but
> those I believe have code to annotate the tables with "has partitions"
> and also the number of partitions (I guess). Although, that thread has
> died some time ago, so my memory can be vague.
> 
> Do you see that those patches can be used in current discussion in any way?

It wouldn't really be a bad idea to put that patch here, because there's
no special reason for it to be in the CF for PG 11, if we are talking here
about changing \d command outputs anyway.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-11 Thread Ashutosh Bapat
On Wed, Jul 12, 2017 at 8:23 AM, Amit Langote
 wrote:
> On 2017/07/11 18:57, Ashutosh Bapat wrote:
>> On Tue, Jul 11, 2017 at 4:16 AM, David Fetter  wrote:
>>> So whatever we land on needs to mention partition_of and
>>> has_partitions.  Is that latter just its immediate partitions?
>>> Recursion all the way down?  Somewhere in between?
>>>
>>
>> We have patches proposed to address some of those concerns at [1]
>>
>> [1] 
>> https://www.postgresql.org/message-id/CAFjFpRcs5fOSfaAGAjT5C6=yvdd7mrx3knf_spb5dqzojgj...@mail.gmail.com
>
> ISTM, David is talking about the "list tables" (bare \d without any
> pattern) case.  That is, listing partitioned tables as of type
> "partitioned table" instead of "table" as we currently do.  The linked
> patch, OTOH, is for "describe table" (\d ) case.

Right, the patches don't exactly do what David is suggesting, but
those I believe have code to annotate the tables with "has partitions"
and also the number of partitions (I guess). Although, that thread has
died some time ago, so my memory can be vague.

Do you see that those patches can be used in current discussion in any way?


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-11 Thread Amit Langote
On 2017/07/11 13:34, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Mon, Jul 10, 2017 at 2:15 AM, Amit Langote
>>  wrote:
> 
>>> Actually, if \d had shown RELKIND_PARTITIONED_TABLE tables as of Type
>>> "partitioned table", we wouldn't need a separate flag for marking a table
>>> as having partitions.
>>
>> I think that is false.  Whether something is partitioned and whether
>> it is a partition are independent concerns.
> 
> Maybe this discussion is easier if we differentiate "list tables" (\dt,
> or \d without a pattern) from "describe table" (\d with a name pattern).

I think this discussion has mostly focused on "list tables" so far.

> It seems to me that the "describe" command should list partitions --
> perhaps only when the + flag is given.

That's what happens today.

> However, the "list tables"
> command \dt should definitely IMO not list partitions.

Do you mean never?  Even if a modifier is specified?  In the patch I
proposed, \d! (or \d+ or \d++, if '!' turns out to be unpopular) will list
partitions, but \d or \dt won't.  That is, partitions are hidden by default.

> Maybe \dt should
> have some flag indicating whether each table is partitioned.

So it seems most of us are in favor for showing partitioned tables as
"partitioned table" instead of "table" in the table listing.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-11 Thread Amit Langote
On 2017/07/11 18:57, Ashutosh Bapat wrote:
> On Tue, Jul 11, 2017 at 4:16 AM, David Fetter  wrote:
>> So whatever we land on needs to mention partition_of and
>> has_partitions.  Is that latter just its immediate partitions?
>> Recursion all the way down?  Somewhere in between?
>>
> 
> We have patches proposed to address some of those concerns at [1]
> 
> [1] 
> https://www.postgresql.org/message-id/CAFjFpRcs5fOSfaAGAjT5C6=yvdd7mrx3knf_spb5dqzojgj...@mail.gmail.com

ISTM, David is talking about the "list tables" (bare \d without any
pattern) case.  That is, listing partitioned tables as of type
"partitioned table" instead of "table" as we currently do.  The linked
patch, OTOH, is for "describe table" (\d ) case.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor style cleanup in tab-complete.c

2017-07-11 Thread Thomas Munro
Hi,

>From the triviality department: I noticed some branches in
tab-complete.c's gargantuan if statement, mostly brand new, that break
from the established brace style.  Should we fix that like this?

-- 
Thomas Munro
http://www.enterprisedb.com


remove-extraneous-braces-from-tab-complete.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subscription code improvements

2017-07-11 Thread Masahiko Sawada
On Sat, Jul 8, 2017 at 5:19 AM, Petr Jelinek
 wrote:
> Hi,
>
> I have done some review of subscription handling (well self-review) and
> here is the result of that (It's slightly improved version from another
> thread [1]).

Thank you for working on this and patches!

> I split it into several patches:
>
> 0001 - Makes subscription worker exit nicely when there is subscription
> missing (ie was removed while it was starting) and also makes disabled
> message look same as the message when disabled state was detected while
> worker is running. This is mostly just nicer behavior for user (ie no
> unexpected errors in log when you just disabled the subscription).
>
> 0002 - This is bugfix - the sync worker should exit when waiting for
> apply and apply dies otherwise there is possibility of not being
> correctly synchronized.
>
> 0003 - Splits the schizophrenic SetSubscriptionRelState function into
> two which don't try to do broken upsert and report proper errors instead.
>
> 0004 - Solves the issue which Masahiko Sawada reported [2] about ALTER
> SUBSCRIPTION handling of workers not being transactional - we now do
> killing of workers transactionally (and we do the same when possible in
> DROP SUBSCRIPTION).
>
> 0005 - Bugfix to allow syscache access to subscription without being
> connected to database - this should work since subscription is pinned
> catalog, it wasn't caught because nothing in core is using it (yet).
>
> 0006 - Makes the locking of subscriptions more granular (this depends on
> 0005). This removes the ugly AccessExclusiveLock on the pg_subscription
> catalog from DROP SUBSCRIPTION and also solves some potential race
> conditions between launcher, ALTER SUBSCRIPTION and
> process_syncing_tables_for_apply(). Also simplifies memory handling in
> launcher as well as logicalrep_worker_stop() function. This basically
> makes subscriptions behave like every other object in the database in
> terms of locking.
>
> Only the 0002, 0004 and 0005 are actual bug fixes, but I'd still like to
> get it all into PG10 as especially the locking now behaves really
> differently than everything else and that does not seem like a good idea.
>

I'm now planing to review 0002, 0004 and 0005 patches first as they
are bug fixes. Should we add them to the open item list?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-11 Thread Amit Langote
On 2017/07/12 4:24, Dean Rasheed wrote:
> On 11 July 2017 at 13:29, Ashutosh Bapat  
> wrote:
>> Most of the patch seems to be replacing "content" with "kind",
>> RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
>> with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
>> to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
>> and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
>> PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
>> MINVALUE/MAXVALUE change. May be we should reuse the previous
>> variables, enum type name and except those two, so that the total
>> change introduced by the patch is minimal.
>>
> 
> No, this isn't just renaming that other enum. It's about replacing the
> boolean "infinite" flag on PartitionRangeDatum with something that can
> properly enumerate the 3 kinds of PartitionRangeDatum that are allowed
> (and, as noted above "finite"/"infinite isn't the right terminology
> either). Putting that new enum in parsenodes.h makes it globally
> available, wherever the PartitionRangeDatum structure is used. A
> side-effect of that change is that the old RangeDatumContent enum that
> was local to partition.c is no longer needed.
> 
> RangeDatumContent wouldn't be a good name for a globally visible enum
> of this kind because that name fails to link it to partitioning in any
> way, and could easily be confused as having something to do with RTEs
> or range types. Also, the term "content" is more traditionally used in
> the Postgres sources for a field *holding* content, rather than a
> field specifying the *kind* of content. On the other hand, you'll note
> that the term "kind" is by far the most commonly used term for naming
> this kind of enum, and any matching fields.
> 
> IMO, code consistency and readability takes precedence over keeping
> patch sizes down.

I agree with Dean here that the new global PartitionRangeDatumKind enum is
an improvement over the previous infinite flag in the parse node plus the
partition.c local RangeDatumContent enum for all the reasons he mentioned.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-07-11 Thread Amit Langote
On 2017/07/11 19:49, Ashutosh Bapat wrote:
> On Tue, Jul 4, 2017 at 9:51 AM, Amit Langote
>  wrote:
> 
>>
>> Attached updated patches.
> 
> There's an extra "we" in
> +* Note that attachRel's OID is in this list.  If it's partitioned, we
> +* we don't need to schedule it to be scanned (would be a noop anyway
> 
> And some portions of the comment before find_all_inheritors() in
> ATExecAttachPartition() look duplicated in portions of the code that
> check constraints on the table being attached and each of its leaf
> partition.
> 
> Other than that the patches look good to me.

Thanks for the review.  Patch updated taking care of the comments.

Regards,
Amit
From b7eb9a2bc0d5742f826da979e0bda5d4d2c25472 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 14 Jun 2017 11:32:01 +0900
Subject: [PATCH 1/2] Cope with differing attnos in ATExecAttachPartition code

If the table being attached has attnos different from the parent for
the partitioning columns which are present in the partition constraint
expressions, then predicate_implied_by() will prematurely return false
due to structural inequality of the corresponding Var expressions in the
the partition constraint and those in the table's check constraint
expressions.  Fix this by mapping the partition constraint's expressions
to bear the partition's attnos.

Further, if the validation scan needs to be performed after all and
the table being attached is a partitioned table, we will need to map
the constraint expression again to change the attnos to the individual
leaf partition's attnos from those of the table being attached.
---
 src/backend/commands/tablecmds.c  | 87 +++
 src/test/regress/expected/alter_table.out | 45 
 src/test/regress/sql/alter_table.sql  | 38 ++
 3 files changed, 137 insertions(+), 33 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index bb00858ad1..8047c9a7bc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13421,7 +13421,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 {
RelationattachRel,
catalog;
-   List   *childrels;
+   List   *attachRel_children;
TupleConstr *attachRel_constr;
List   *partConstraint,
   *existConstraint;
@@ -13489,10 +13489,25 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
/*
 * Prevent circularity by seeing if rel is a partition of attachRel. (In
 * particular, this disallows making a rel a partition of itself.)
+*
+* We do that by checking if rel is a member of the list of attachRel's
+* partitions provided the latter is partitioned at all.  We want to 
avoid
+* having to construct this list again, so we request the strongest lock
+* on all partitions.  We need the strongest lock, because we may decide
+* to scan them if we find out that the table being attached (or its 
leaf
+* partitions) may contain rows that violate the partition constraint.
+* If the table has a constraint that would prevent such rows, which by
+* definition is present in all the partitions, we need not scan the
+* table, nor its partitions.  But we cannot risk a deadlock by taking a
+* weaker lock now and the stronger one only when needed.
+*
+* XXX - Do we need to lock the partitions here if we already have the
+* strongest lock on attachRel?  The information we need here to check
+* for circularity cannot change without taking a lock on attachRel.
 */
-   childrels = find_all_inheritors(RelationGetRelid(attachRel),
-   
AccessShareLock, NULL);
-   if (list_member_oid(childrels, RelationGetRelid(rel)))
+   attachRel_children = find_all_inheritors(RelationGetRelid(attachRel),
+   
 AccessExclusiveLock, NULL);
+   if (list_member_oid(attachRel_children, RelationGetRelid(rel)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_TABLE),
 errmsg("circular inheritance not allowed"),
@@ -13603,6 +13618,13 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
partConstraint = list_make1(make_ands_explicit(partConstraint));
 
/*
+* Adjust the generated constraint to match this partition's attribute
+* numbers.
+*/
+   partConstraint = map_partition_varattnos(partConstraint, 1, attachRel,
+   
 rel);
+
+   /*
 * Check if we can 

[HACKERS] Finding the min bounding box of an Index leaf page

2017-07-11 Thread Amira Shawky
Greetings,


I have SPGist and Gist index are built on 2d points. I'm wondering does 
spgist/gist track the bounding box of each leaf page (data block)?

and if yes, how can I access it



Best Regards
---
Amira Shawky Mohamed
TA at Computer Dept.
Faculty of Engineering
Cairo University



Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-11 Thread Alvaro Herrera
Mark Rofail wrote:

>- now the RI checks utilise the @>(anyarray, anyelement)
>   - however there's a small problem:
>   operator does not exist: integer[] @> smallint
>   I assume that external casting would be required here. But how can I
>   downcast smallint to integer or interger to numeric automatically ?

We have one opclass for each type combination -- int4 to int2, int4 to
int4, int4 to int8, etc.  You just need to add the new strategy to all
the opclasses.

BTW now that we've gone through this a little further, it's starting to
look like a mistake to me to use the same @> operator for (anyarray,
anyelement) than we use for (anyarray, anyarray).  I have the feeling
we'd do better by having some other operator for this purpose -- dunno,
maybe @>> or @>.  ... whatever you think is reasonable and not already
in use.  Unless there is some other reason to pick @> for this purpose.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-11 Thread Mark Rofail
here are the modifications to ri_triggers.c

On Wed, Jul 12, 2017 at 12:26 AM, Mark Rofail 
wrote:
>
> *What I did *
>
>- now the RI checks utilise the @>(anyarray, anyelement)
>
> Best Regards,
> Mark Rofail
>
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3a25ba52f3..2d2b8e6a4f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2650,7 +2650,7 @@ quoteRelationName(char *buffer, Relation rel)
  * ri_GenerateQual --- generate a WHERE clause equating two variables
  *
  * The idea is to append " sep leftop op rightop" to buf, or if fkreftype is
- * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop op ANY(rightop)" to buf.
+ * FKCONSTR_REF_EACH_ELEMENT, append " sep leftop <@ rightop" to buf.
  *
  * The complexity comes from needing to be sure that the parser will select
  * the desired operator.  We always name the operator using
@@ -2694,21 +2694,34 @@ ri_GenerateQual(StringInfo buf,
  	else
  		oprright = operform->oprright;
  
-	appendStringInfo(buf, " %s %s", sep, leftop);
-	if (leftoptype != operform->oprleft)
-		ri_add_cast_to(buf, operform->oprleft);
- 
- 	appendStringInfo(buf, " OPERATOR(%s.%s) ",
+ 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT){
+		appendStringInfo(buf, " %s %s", sep, rightop);
+
+		if (rightoptype != oprright)
+ 			ri_add_cast_to(buf, oprright);
+
+		appendStringInfo(buf, " @> ");
+
+		appendStringInfoString(buf, leftop);
+
+		if (leftoptype != operform->oprleft)
+			ri_add_cast_to(buf, operform->oprleft);
+	 }	
+	else{
+		appendStringInfo(buf, " %s %s", sep, leftop);
+
+		if (leftoptype != operform->oprleft)
+			ri_add_cast_to(buf, operform->oprleft);
+
+		appendStringInfo(buf, " OPERATOR(%s.%s) ",
  	 quote_identifier(nspname), oprname);
- 
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoString(buf, "ANY (");
- 	appendStringInfoString(buf, rightop);
- 	if (rightoptype != oprright)
- 		ri_add_cast_to(buf, oprright);
- 	if (fkreftype == FKCONSTR_REF_EACH_ELEMENT)
- 		appendStringInfoChar(buf, ')');
 
+		appendStringInfoString(buf, rightop);
+
+		if (rightoptype != oprright)
+ 			ri_add_cast_to(buf, oprright);
+	}
+	
 	ReleaseSysCache(opertup);
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-11 Thread Mark Rofail
On Sun, Jul 9, 2017 at 7:42 PM, Alexander Korotkov 
wrote:

> We may document that GIN index is required to accelerate RI queries for
> array FKs.  And users are encouraged to manually define them.
> It's also possible to define new option when index on referencing
> column(s) would be created automatically.  But I think this option should
> work the same way for regular FKs and array FKs.
>

I just thought because GIN index is suited for composite elements, it would
be appropriate for array FKs.

So we should leave it to the user ? I think tht would be fine too.

*What I did *

   - now the RI checks utilise the @>(anyarray, anyelement)
  - however there's a small problem:
  operator does not exist: integer[] @> smallint
  I assume that external casting would be required here. But how can I
  downcast smallint to integer or interger to numeric automatically ?

*What I plan to do*

   - work on the above mentioned buy/limitation
   - otherwise, I think this concludes limitation #5

fatal performance issues.  If you issue any UPDATE or DELETE against the PK
> table, you get a query like this for checking to see if the RI constraint
> would be violated:

SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;.

or is there anything remaining ?

Best Regards,
Mark Rofail


Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-11 Thread Jeff Janes
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar  wrote:

>
>
> In below function, we always multiply the target->cost.per_tuple with
> path->rows, but in case of gather it should multiply this with
> subpath->rows
>
> apply_projection_to_path()
> 
>
> path->startup_cost += target->cost.startup - oldcost.startup;
> path->total_cost += target->cost.startup - oldcost.startup +
> (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
>
>
> So because of this high projection cost the seqpath and parallel path
> both have fuzzily same cost but seqpath is winning because it's
> parallel safe.
>

I think you are correct.  However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument.  It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.

So the attached patch improves things, but doesn't go far enough.

Cheers,

Jeff


subpath_projection_cost.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-11 Thread Dean Rasheed
On 11 July 2017 at 13:29, Ashutosh Bapat
 wrote:
> + 
> +  Also note that some element types, such as timestamp,
> +  have a notion of "infinity", which is just another value that can
> +  be stored. This is different from MINVALUE and
> +  MAXVALUE, which are not real values that can be stored,
> +  but rather they are ways of saying the value is unbounded.
> +  MAXVALUE can be thought of as being greater than any
> +  other value, including "infinity" and MINVALUE as being
> +  less than any other value, including "minus infinity". Thus the range
> +  FROM ('infinity') TO (MAXVALUE) is not an empty range; it
> +  allows precisely one value to be stored  the timestamp
> +  "infinity".
>   
>
> The description in this paragraph seems to be attaching intuitive
> meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
> different intuitive meanings of themselves. Not sure if that's how we
> should describe MAXVALUE/MINVALUE.
>

I'm not sure I understand your point. MINVALUE and MAXVALUE do mean
unbounded below and above respectively. This paragraph is just making
the point that that isn't the same as -/+ infinity.


> Most of the patch seems to be replacing "content" with "kind",
> RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
> with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
> to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
> and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
> PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
> MINVALUE/MAXVALUE change. May be we should reuse the previous
> variables, enum type name and except those two, so that the total
> change introduced by the patch is minimal.
>

No, this isn't just renaming that other enum. It's about replacing the
boolean "infinite" flag on PartitionRangeDatum with something that can
properly enumerate the 3 kinds of PartitionRangeDatum that are allowed
(and, as noted above "finite"/"infinite isn't the right terminology
either). Putting that new enum in parsenodes.h makes it globally
available, wherever the PartitionRangeDatum structure is used. A
side-effect of that change is that the old RangeDatumContent enum that
was local to partition.c is no longer needed.

RangeDatumContent wouldn't be a good name for a globally visible enum
of this kind because that name fails to link it to partitioning in any
way, and could easily be confused as having something to do with RTEs
or range types. Also, the term "content" is more traditionally used in
the Postgres sources for a field *holding* content, rather than a
field specifying the *kind* of content. On the other hand, you'll note
that the term "kind" is by far the most commonly used term for naming
this kind of enum, and any matching fields.

IMO, code consistency and readability takes precedence over keeping
patch sizes down.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Domains and arrays and composites, oh my

2017-07-11 Thread Tom Lane
I started to look into allowing domains over composite types, which is
another never-implemented case that there's no very good reason not to
allow.  Well, other than the argument that the SQL standard only allows
domains over "predefined" (built-in) types ... but we blew past that
restriction ages ago.

Any thought that there might be some fundamental problem with that was
soon dispelled when I noticed that we allow domains over arrays of
composite types.  Ahem.

They even work, mostly.  I wrote a few test cases and couldn't find
anything that failed except for attempts to insert or update multiple
subfields of the same base column; that's because process_matched_tle()
fails to look through CoerceToDomain nodes.  But that turns out to be a
bug even for the simpler case of domains over arrays of scalars, which
is certainly supported.  That is, given

create domain domainint4arr int4[];
create table domarrtest (testint4arr domainint4arr);

this ought to work:

insert into domarrtest (testint4arr[1], testint4arr[3]) values (11,22);

It would work with a plain-array target column, but with the domain
it fails with

ERROR:  multiple assignments to same column "testint4arr"

Hence, the attached proposed patch, which fixes that oversight and
adds some relevant test cases.  (I've not yet looked to see if any
documentation changes would be appropriate.)

I think this is a bug fix and should be back-patched.  Any objections?

regards, tom lane

diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index f3c7526..0a6f4f3 100644
*** a/src/backend/rewrite/rewriteHandler.c
--- b/src/backend/rewrite/rewriteHandler.c
*** process_matched_tle(TargetEntry *src_tle
*** 934,939 
--- 934,940 
  	const char *attrName)
  {
  	TargetEntry *result;
+ 	CoerceToDomain *coerce_expr = NULL;
  	Node	   *src_expr;
  	Node	   *prior_expr;
  	Node	   *src_input;
*** process_matched_tle(TargetEntry *src_tle
*** 970,979 
--- 971,1000 
  	 * For FieldStore, instead of nesting we can generate a single
  	 * FieldStore with multiple target fields.  We must nest when
  	 * ArrayRefs are involved though.
+ 	 *
+ 	 * As a further complication, the destination column might be a domain,
+ 	 * resulting in each assignment containing a CoerceToDomain node over a
+ 	 * FieldStore or ArrayRef.  These should have matching target domains, so
+ 	 * we strip them and reconstitute a single CoerceToDomain over the
+ 	 * combined FieldStore/ArrayRef nodes.  (Notice that this has the result
+ 	 * that the domain's checks are applied only after we do all the field or
+ 	 * element updates, not after each one.  This is arguably desirable.)
  	 *--
  	 */
  	src_expr = (Node *) src_tle->expr;
  	prior_expr = (Node *) prior_tle->expr;
+ 
+ 	if (src_expr && IsA(src_expr, CoerceToDomain) &&
+ 		prior_expr && IsA(prior_expr, CoerceToDomain) &&
+ 		((CoerceToDomain *) src_expr)->resulttype ==
+ 		((CoerceToDomain *) prior_expr)->resulttype)
+ 	{
+ 		/* we assume without checking that resulttypmod/resultcollid match */
+ 		coerce_expr = (CoerceToDomain *) src_expr;
+ 		src_expr = (Node *) ((CoerceToDomain *) src_expr)->arg;
+ 		prior_expr = (Node *) ((CoerceToDomain *) prior_expr)->arg;
+ 	}
+ 
  	src_input = get_assignment_input(src_expr);
  	prior_input = get_assignment_input(prior_expr);
  	if (src_input == NULL ||
*** process_matched_tle(TargetEntry *src_tle
*** 1042,1047 
--- 1063,1078 
  		newexpr = NULL;
  	}
  
+ 	if (coerce_expr)
+ 	{
+ 		/* put back the CoerceToDomain */
+ 		CoerceToDomain *newcoerce = makeNode(CoerceToDomain);
+ 
+ 		memcpy(newcoerce, coerce_expr, sizeof(CoerceToDomain));
+ 		newcoerce->arg = (Expr *) newexpr;
+ 		newexpr = (Node *) newcoerce;
+ 	}
+ 
  	result = flatCopyTargetEntry(src_tle);
  	result->expr = (Expr *) newexpr;
  	return result;
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index 41b70e2..9fb750d 100644
*** a/src/test/regress/expected/domain.out
--- b/src/test/regress/expected/domain.out
*** INSERT INTO domarrtest values ('{2,2}', 
*** 107,112 
--- 107,113 
  INSERT INTO domarrtest values (NULL, '{{"a","b","c"},{"d","e","f"}}');
  INSERT INTO domarrtest values (NULL, '{{"toolong","b","c"},{"d","e","f"}}');
  ERROR:  value too long for type character varying(4)
+ INSERT INTO domarrtest (testint4arr[1], testint4arr[3]) values (11,22);
  select * from domarrtest;
testint4arr  |testchar4arr 
  ---+-
*** select * from domarrtest;
*** 115,121 
   {2,2} | {{a,b},{c,d},{e,f}}
   {2,2} | {{a},{c}}
 | {{a,b,c},{d,e,f}}
! (5 rows)
  
  select testint4arr[1], testchar4arr[2:2] from domarrtest;
   testint4arr | testchar4arr 
--- 116,123 
   {2,2} | {{a,b},{c,d},{e,f}}
   {2,2} | {{a},{c}}
 | {{a,b,c},{d,e,f}}
!  

Re: [HACKERS] Arrays of domains

2017-07-11 Thread Andrew Dunstan


On 07/11/2017 12:44 PM, Tom Lane wrote:
>
> Can anyone think of a reason not to pursue that?
>
>   


I'm all for it.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Arrays of domains

2017-07-11 Thread David Fetter
On Tue, Jul 11, 2017 at 12:44:33PM -0400, Tom Lane wrote:
> Over in
> https://www.postgresql.org/message-id/877ezgyn60@metapensiero.it
> there's a gripe about array_agg() not working for a domain type.
> It fails because we don't create an array type over a domain type,
> so the parser can't identify a suitable output type for the polymorphic
> aggregate.
> 
> We could imagine tweaking the polymorphic-function resolution rules
> so that a domain matched to ANYELEMENT is smashed to its base type,
> allowing ANYARRAY to be resolved as the base type's array type.
> While that would be a pretty localized fix, it seems like a kluge
> to me.
> 
> Probably a better answer is to start supporting arrays over domain
> types.  That was left unimplemented in the original domains patch,
> but AFAICS not for any better reason than lack of round tuits.
> I did find an argument here:
> https://www.postgresql.org/message-id/3c98f7f6.29fe1...@redhat.com
> that the SQL spec forbids domains over arrays, but that's the opposite
> case (and a restriction we long since ignored, anyway).
> 
> Can anyone think of a reason not to pursue that?

+1 for pursuing it.  When operations just compose, users get a more
fun experience.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql 10: hash indexes testing

2017-07-11 Thread Alvaro Herrera
Amit Kapila wrote:

> Yes, I also think the same idea can be used, in fact, I have mentioned
> it [1] as soon as you have committed that patch.  Do we want to do
> anything at this stage for PG-10?  I don't think we should attempt
> something this late unless people feel this is a show-stopper issue
> for usage of hash indexes.  If required, I think a separate function
> can be provided to allow users to perform squeeze operation.

Sorry, I have no idea how critical this squeeze thing is for the
newfangled hash indexes, so I cannot comment on that.  Does this make
the indexes unusable in some way under some circumstances?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Fwd: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-07-11 Thread Claudio Freire
Resending without the .tar.bz2 that get blocked


Sorry for the delay, I had extended vacations that kept me away from
my test rigs, and afterward testing too, liteally, a few weeks.

I built a more thoroguh test script that produced some interesting
results. Will attach the results.

For now, to the review comments:

On Thu, Apr 27, 2017 at 4:25 AM, Masahiko Sawada  wrote:
> I've read this patch again and here are some review comments.
>
> + * Lookup in that structure proceeds sequentially in the list of segments,
> + * and with a binary search within each segment. Since segment's size grows
> + * exponentially, this retains O(log N) lookup complexity (2 log N to be
> + * precise).
>
> IIUC we now do binary search even over the list of segments.

Right

>
> -
>
> We often fetch a particular dead tuple segment. How about providing a
> macro for easier understanding?
> For example,
>
>  #define GetDeadTuplsSegment(lvrelstats, seg) \
>   (&(lvrelstats)->dead_tuples.dt_segments[(seg)])
>
> -
>
> +   if (vacrelstats->dead_tuples.num_segs == 0)
> +   return;
> +
>
> +   /* If uninitialized, we have no tuples to delete from the indexes */
> +   if (vacrelstats->dead_tuples.num_segs == 0)
> +   {
> +   return;
> +   }
>
> +   if (vacrelstats->dead_tuples.num_segs == 0)
> +   return false;
> +

Ok

> As I listed, there is code to check if dead tuple is initialized
> already in some places where doing actual vacuum.
> I guess that it should not happen that we attempt to vacuum a
> table/index page while not having any dead tuple. Is it better to have
> Assert or ereport instead?

I'm not sure. Having a non-empty dead tuples array is not necessary to
be able to honor the contract in the docstring. Most of those functions
clean up the heap/index of dead tuples given the array of dead tuples,
which is a no-op for an empty array.

The code that calls those functions doesn't bother calling if the array
is known empty, true, but there's no compelling reason to enforce that at the
interface. Doing so could cause subtle bugs rather than catch them
(in the form of unexpected assertion failures, if some caller forgot to
check the dead tuples array for emptiness).

If you're worried about the possibility that some bugs fails to record
dead tuples in the array, and thus makes VACUUM silently ineffective,
I instead added a test for that case. This should be a better approach,
since it's more likely to catch unexpected failure modes than an assert.

> @@ -1915,2 +2002,2 @@ count_nondeletable_pages(Relation onerel,
> LVRelStats *vacrelstats)
> -   BlockNumber prefetchStart;
> -   BlockNumber pblkno;
> +   BlockNumber prefetchStart;
> +   BlockNumber pblkno;
>
> I think that it's a unnecessary change.

Yep. But funnily that's how it's now in master.

>
> -
>
> +   /* Search for the segment likely to contain the item pointer */
> +   iseg = vac_itemptr_binsrch(
> +   (void *) itemptr,
> +   (void *)
> &(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
> +   vacrelstats->dead_tuples.last_seg + 1,
> +   sizeof(DeadTuplesSegment));
> +
>
> I think that we can change the above to;
>
> +   /* Search for the segment likely to contain the item pointer */
> +   iseg = vac_itemptr_binsrch(
> +   (void *) itemptr,
> +   (void *) &(seg->last_dead_tuple),
> +   vacrelstats->dead_tuples.last_seg + 1,
> +   sizeof(DeadTuplesSegment));
>
> We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.

Right

Attached is a current version of both patches, rebased since we're at it.

I'm also attaching the output from the latest benchmark runs, in raw
(tar.bz2) and digested (bench_report) forms, the script used to run
them (vacuumbench.sh) and to produce the reports
(vacuum_bench_report.sh).

Those are before the changes in the review. While I don't expect any
change, I'll re-run some of them just in case, and try to investigate
the slowdown. But that will take forever. Each run takes about a week
on my test rig, and I don't have enough hardware to parallelize the
tests. I will run a test on a snapshot of a particularly troublesome
production database we have, that should be interesting.

The benchmarks show a consistent improvement at scale 400, which may
be related to the search implementation being better somehow, and a
slowdown at scale 4000 in some variants. I believe this is due to
those variants having highly clustered indexes. While the "shuf"
(shuffled) variantes were intended to be the opposite of that, I
suspect I somehow failed to get the desired outcome, so I'll be
double-checking that.

In any case the slowdown is only materialized when vacuuming with a
large mwm setting, which is something that shouldn't happen
unintentionally.

[HACKERS] Arrays of domains

2017-07-11 Thread Tom Lane
Over in
https://www.postgresql.org/message-id/877ezgyn60@metapensiero.it
there's a gripe about array_agg() not working for a domain type.
It fails because we don't create an array type over a domain type,
so the parser can't identify a suitable output type for the polymorphic
aggregate.

We could imagine tweaking the polymorphic-function resolution rules
so that a domain matched to ANYELEMENT is smashed to its base type,
allowing ANYARRAY to be resolved as the base type's array type.
While that would be a pretty localized fix, it seems like a kluge
to me.

Probably a better answer is to start supporting arrays over domain
types.  That was left unimplemented in the original domains patch,
but AFAICS not for any better reason than lack of round tuits.
I did find an argument here:
https://www.postgresql.org/message-id/3c98f7f6.29fe1...@redhat.com
that the SQL spec forbids domains over arrays, but that's the opposite
case (and a restriction we long since ignored, anyway).

Can anyone think of a reason not to pursue that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-07-11 Thread Shubham Barai
Project: Explicitly support predicate locks in index AMs besides b-tree


I have done following tasks during this week.

1) worked on how to detect rw conflicts when fast update is enabled

2) created tests for different gin operators

3) went through some patches on commitfest to review

4) solved some issues that came up while testing

link to the code:
https://github.com/shubhambaraiss/postgres/commit/1365d75db36a4e398406dd266c3d4fe8e1ec30ff

 Sent with Mailtrack



Re: [HACKERS] Multi column range partition table

2017-07-11 Thread Ashutosh Bapat
On Sun, Jul 9, 2017 at 1:12 PM, Dean Rasheed  wrote:
> On 6 July 2017 at 22:43, Joe Conway  wrote:
>> I agree we should get this right the first time and I also agree with
>> Dean's proposal, so I guess I'm a +2
>>
>
> On 7 July 2017 at 03:21, Amit Langote  wrote:
>> +1 to releasing this syntax in PG 10.
>>
>
> So, that's 3 votes in favour of replacing UNBOUNDED with
> MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge
> consensus, but no objections either. Any one else have an opinion?
>

+ 
+  Also note that some element types, such as timestamp,
+  have a notion of "infinity", which is just another value that can
+  be stored. This is different from MINVALUE and
+  MAXVALUE, which are not real values that can be stored,
+  but rather they are ways of saying the value is unbounded.
+  MAXVALUE can be thought of as being greater than any
+  other value, including "infinity" and MINVALUE as being
+  less than any other value, including "minus infinity". Thus the range
+  FROM ('infinity') TO (MAXVALUE) is not an empty range; it
+  allows precisely one value to be stored  the timestamp
+  "infinity".
  

The description in this paragraph seems to be attaching intuitive
meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
different intuitive meanings of themselves. Not sure if that's how we
should describe MAXVALUE/MINVALUE.

Most of the patch seems to be replacing "content" with "kind",
RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
MINVALUE/MAXVALUE change. May be we should reuse the previous
variables, enum type name and except those two, so that the total
change introduced by the patch is minimal.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql 10: hash indexes testing

2017-07-11 Thread Amit Kapila
On Tue, Jul 11, 2017 at 8:10 AM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>> On Tue, Jul 11, 2017 at 6:51 AM, AP  wrote:
>> > On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
>
>> >> I can understand your concerns.  To address first concern we need to
>> >> work on one or more of following work items: (a) work on vacuums that
>> >> can be triggered on insert only workload (it should perform index
>> >> vacuum as well) (b) separate utility statement/function to squeeze
>> >> hash index (c) db internally does squeezing like after each split, so
>> >> that chances of such a problem can be reduced, but that will be at the
>> >> cost of performance reduction in other workloads, so not sure if it is
>> >> advisable.  Among these (b) is simplest to do but may not be
>> >> convenient for the user.
>> >
>> > (a) seems like a good compromise on (c) if it can be done without 
>> > disruption
>> > and in time.
>> > (b) seems analogous to the path autovcauum took. Unless I misremember, 
>> > before
>> > autovacuum we had a cronjob to do similar work. It's probably a sane 
>> > path
>> > to take as a first step on the way to (a)
>> > (c) may not be worth the effort if it compromises general use, though 
>> > perhaps
>> > it could be used to indicate to (a) that now is a good time to handle
>> > this bit?
>>
>> Nice summarization!  I think before doing anything of that sort we
>> need opinions from others as well.  If some other community members
>> also see value in doing one or multiple of above things, then I can
>> write a patch.
>
> I haven't read the thread, but in late PG10 autovacuum gained the idea
> of "work items" that can be plugged from other parts of the server;
> currently BRIN uses it to cause a range to be summarized right after the
> next one starts being filled.  This is activated separately for each
> index via a reloption.  Perhaps something like that can be used for hash
> indexes?  See commit 7526e10224f0792201e99631567bbe44492bbde4.
>

Yes, I also think the same idea can be used, in fact, I have mentioned
it [1] as soon as you have committed that patch.  Do we want to do
anything at this stage for PG-10?  I don't think we should attempt
something this late unless people feel this is a show-stopper issue
for usage of hash indexes.  If required, I think a separate function
can be provided to allow users to perform squeeze operation.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BnVxAGmzj7RPUHscbTdAG1zQub6L9UqFJSu%3DgdwtY%2BpQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql 10: hash indexes testing

2017-07-11 Thread AP
On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
> On Fri, Jul 7, 2017 at 8:22 AM, AP  wrote:
> > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> >> I think if you are under development, it is always advisable to create
> >> indexes after initial bulk load.  That way it will be faster and will
> >> take lesser space atleast in case of hash index.
> >
> > This is a bit of a pickle, actually:
> > * if I do have a hash index I'll wind up with a bloated one at some stage
> >   that refused to allow more inserts until the index is re-created
> > * if I don't have an index then I'll wind up with a table where I cannot
> >   create a hash index because it has too many rows for it to handle
> >
> > I'm at a bit of a loss as to how to deal with this.
> 
> I can understand your concerns.  To address first concern we need to
> work on one or more of following work items: (a) work on vacuums that
> can be triggered on insert only workload (it should perform index
> vacuum as well) (b) separate utility statement/function to squeeze
> hash index (c) db internally does squeezing like after each split, so
> that chances of such a problem can be reduced, but that will be at the
> cost of performance reduction in other workloads, so not sure if it is
> advisable.  Among these (b) is simplest to do but may not be
> convenient for the user.

(a) seems like a good compromise on (c) if it can be done without disruption
and in time.
(b) seems analogous to the path autovcauum took. Unless I misremember, before
autovacuum we had a cronjob to do similar work. It's probably a sane path
to take as a first step on the way to (a)
(c) may not be worth the effort if it compromises general use, though perhaps
it could be used to indicate to (a) that now is a good time to handle
this bit?

> To address your second concern, we need to speed up the creation of
> hash index which is a relatively big project.  Having said that, I
> think in your case, this is one-time operation so spending once more
> time might be okay.

Yup. Primarily I just wanted the idea out there that this isn't that easy
to cope with manually and to get it onto a todo list (unless it was an
easy thing to do given a bit of thought but it appears not).

Out of curiosity, and apologies if you explained it already and I missed
the signficance of the words, how does this bloat happen? There tables
obly cop COPY. There is no UPDATE or DELETE; all transactions get COMMITted
so there's no ROLLBACK undoing the COPY and yet the bloat occurs.

AP


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-07-11 Thread Ashutosh Bapat
On Tue, Jul 4, 2017 at 9:51 AM, Amit Langote
 wrote:

>
> Attached updated patches.

There's an extra "we" in
+* Note that attachRel's OID is in this list.  If it's partitioned, we
+* we don't need to schedule it to be scanned (would be a noop anyway

And some portions of the comment before find_all_inheritors() in
ATExecAttachPartition() look duplicated in portions of the code that
check constraints on the table being attached and each of its leaf
partition.

Other than that the patches look good to me.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-11 Thread Ashutosh Bapat
On Tue, Jul 11, 2017 at 4:16 AM, David Fetter  wrote:
> On Mon, Jul 10, 2017 at 05:33:34PM -0500, Robert Haas wrote:
>> On Mon, Jul 10, 2017 at 2:15 AM, Amit Langote
>>  wrote:
>> > I posted a patch upthread which makes \d hide partitions
>> > (relispartition = true relations) and include them if the newly
>> > proposed '!' modifier is specified.  The '+' modifier is being
>> > used to show additional detail of relations chosen to be listed at
>> > all, so it seemed like a bad idea to extend its meaning to also
>> > dictate whether partitions are to be listed.
>>
>> +1.  That'd be a mess.
>
> With utmost respect, it's less messy than adding '!' to the already
> way too random and mysterious syntax of psql's \ commands.  What
> should '\det!' mean?  What about '\dT!'?
>
>> > Actually, if \d had shown RELKIND_PARTITIONED_TABLE tables as of
>> > Type "partitioned table", we wouldn't need a separate flag for
>> > marking a table as having partitions.
>>
>> I think that is false.  Whether something is partitioned and whether
>> it is a partition are independent concerns.
>
> So whatever we land on needs to mention partition_of and
> has_partitions.  Is that latter just its immediate partitions?
> Recursion all the way down?  Somewhere in between?
>

We have patches proposed to address some of those concerns at [1]

[1] 
https://www.postgresql.org/message-id/CAFjFpRcs5fOSfaAGAjT5C6=yvdd7mrx3knf_spb5dqzojgj...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-07-11 Thread Antonin Houska
Kyotaro HORIGUCHI  wrote:

> > Just one idea that I had while reading the code.
> > 
> > In ExecAsyncEventLoop you iterate estate->es_pending_async, then move the
> > complete requests to the end and finaly adjust estate->es_num_pending_async 
> > so
> > that the array no longer contains the complete requests. I think the point 
> > is
> > that then you can add new requests to the end of the array.
> > 
> > I wonder if a set (Bitmapset) of incomplete requests would make the code 
> > more
> > efficient. The set would contain position of each incomplete request in
> > estate->es_num_pending_async (I think it's the myindex field of
> > PendingAsyncRequest). If ExecAsyncEventLoop used this set to retrieve the
> > requests subject to ExecAsyncNotify etc, then the compaction of
> > estate->es_pending_async wouldn't be necessary.
> > 
> > ExecAsyncRequest would use the set to look for space for new requests by
> > iterating it and trying to find the first gap (which corresponds to 
> > completed
> > request).
> > 
> > And finally, item would be removed from the set at the moment the request
> > state is being set to ASYNCREQ_COMPLETE.
> 
> Effectively it is a waiting-queue followed by a
> completed-list. The point of the compaction is keeping the order
> of waiting or not-yet-completed requests, which is crucial to
> avoid kind-a precedence inversion. We cannot keep the order by
> using bitmapset in such way.

> The current code waits all waiters at once and processes all
> fired events at once. The order in the waiting-queue is
> inessential in the case. On the other hand I suppoese waiting on
> several-tens to near-hundred remote hosts is in a realistic
> target range. Keeping the order could be crucial if we process a
> part of the queue at once in the case.
> 
> Putting siginificance on the deviation of response time of
> remotes, process-all-at-once is effective. In turn we should
> consider the effectiveness of the lifecycle of the larger wait
> event set.

ok, I missed the fact that the order of es_pending_async entries is
important. I think this is worth adding a comment.

Actually the reason I thought of simplification was that I noticed small
inefficiency in the way you do the compaction. In particular, I think it's not
always necessary to swap the tail and head entries. Would something like this
make sense?


/* If any node completed, compact the array. */
if (any_node_done)
{
int hidx = 0,
tidx;

/*
 * Swap all non-yet-completed items to the start of the 
array.
 * Keep them in the same order.
 */
for (tidx = 0; tidx < estate->es_num_pending_async; 
++tidx)
{
PendingAsyncRequest *tail = 
estate->es_pending_async[tidx];

Assert(tail->state != 
ASYNCREQ_CALLBACK_PENDING);

if (tail->state == ASYNCREQ_COMPLETE)
continue;

/*
 * If the array starts with one or more 
incomplete requests,
 * both head and tail point at the same item, 
so there's no
 * point in swapping.
 */
if (tidx > hidx)
{
PendingAsyncRequest *head = 
estate->es_pending_async[hidx];

/*
 * Once the tail got ahead, it should 
only leave
 * ASYNCREQ_COMPLETE behind. Only those 
can then be seen
 * by head.
 */
Assert(head->state == 
ASYNCREQ_COMPLETE);

estate->es_pending_async[tidx] = head;
estate->es_pending_async[hidx] = tail;
}

++hidx;
}

estate->es_num_pending_async = hidx;
}

And besides that, I think it'd be more intuitive if the meaning of "head" and
"tail" was reversed: if the array is iterated from lower to higher positions,
then I'd consider head to be at higher position, not tail.

-- 
Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener
Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in backend/storage/ipc/standby.c

2017-07-11 Thread Kyotaro HORIGUCHI
Hello.

I noticed that a comment above StandbyAcquireAccessExclusiveLock
in backend/storage/ipc/standby.c using wrong names of a variable
and a type.

The attached patch fixes it. The same mistake is found in older
versions back to 9.0.

fix_typo_of_standby_c_10_master.patch is for 10 and master and
fix_typo_of_standby_c_96_and_before.patch for 9.6 and before.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***
*** 587,596  StandbyLockTimeoutHandler(void)
   * one transaction on one relation.
   *
   * We keep a single dynamically expandible list of locks in local memory,
!  * RelationLockList, so we can keep track of the various entries made by
   * the Startup process's virtual xid in the shared lock table.
   *
!  * List elements use type xl_rel_lock, since the WAL record type exactly
   * matches the information that we need to keep track of.
   *
   * We use session locks rather than normal locks so we don't need
--- 587,596 
   * one transaction on one relation.
   *
   * We keep a single dynamically expandible list of locks in local memory,
!  * RecoveryLockList, so we can keep track of the various entries made by
   * the Startup process's virtual xid in the shared lock table.
   *
!  * List elements use type xl_standby_lock, since the WAL record type exactly
   * matches the information that we need to keep track of.
   *
   * We use session locks rather than normal locks so we don't need
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***
*** 586,599  StandbyLockTimeoutHandler(void)
   * one transaction on one relation.
   *
   * We keep a single dynamically expandible list of locks in local memory,
!  * RelationLockList, so we can keep track of the various entries made by
   * the Startup process's virtual xid in the shared lock table.
   *
   * We record the lock against the top-level xid, rather than individual
   * subtransaction xids. This means AccessExclusiveLocks held by aborted
   * subtransactions are not released as early as possible on standbys.
   *
!  * List elements use type xl_rel_lock, since the WAL record type exactly
   * matches the information that we need to keep track of.
   *
   * We use session locks rather than normal locks so we don't need
--- 586,599 
   * one transaction on one relation.
   *
   * We keep a single dynamically expandible list of locks in local memory,
!  * RecoveryLockList, so we can keep track of the various entries made by
   * the Startup process's virtual xid in the shared lock table.
   *
   * We record the lock against the top-level xid, rather than individual
   * subtransaction xids. This means AccessExclusiveLocks held by aborted
   * subtransactions are not released as early as possible on standbys.
   *
!  * List elements use type xl_standby_lock, since the WAL record type exactly
   * matches the information that we need to keep track of.
   *
   * We use session locks rather than normal locks so we don't need

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-11 Thread Noah Misch
On Sun, Jul 09, 2017 at 08:42:32AM +0100, Dean Rasheed wrote:
> On 6 July 2017 at 22:43, Joe Conway  wrote:
> > I agree we should get this right the first time and I also agree with
> > Dean's proposal, so I guess I'm a +2
> >
> 
> On 7 July 2017 at 03:21, Amit Langote  wrote:
> > +1 to releasing this syntax in PG 10.
> >
> 
> So, that's 3 votes in favour of replacing UNBOUNDED with
> MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge
> consensus, but no objections either. Any one else have an opinion?
> 
> Robert, have you been following this thread?
> 
> I was thinking of pushing this later today, in time for beta2.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers