from:"Amit Kapila"

Re: [HACKERS] A GUC to prevent leader processes from running subplans?

2017-11-11 Thread Amit Kapila

On Sun, Nov 12, 2017 at 9:18 AM, Thomas Munro
 wrote:
> On Sat, Oct 21, 2017 at 8:09 AM, Robert Haas  wrote:
>> On Tue, Oct 17, 2017 at 7:27 AM, Thomas Munro
>>  wrote:
>>
>> I don't think overloading force_parallel_mode is a good idea, but
>> having some other GUC for this seems OK to me.  Not sure I like
>> multiplex_gather, though.
>
> How about parallel_leader_participation = on|off?  The attached
> version has it that way, and adds regression tests to exercise on, off
> and off-but-couldn't-start-any-workers for both kinds of gather node.
>
> I'm not sure why node->need_to_rescan is initialised by both
> ExecGatherInit() and ExecGather().  Only the latter's value matters,
> right?
>

I don't see anything like need_to_rescan in the GatherState node.  Do
you intend to say need_to_scan_locally?  If yes, then I think whatever
you said is right.


> I've added this to the January Commitfest.
>

+1 to this idea.  Do you think such an option at table level can be
meaningful?  We have a parallel_workers as a storage option for
tables, so users might want leader to participate in parallelism only
for some of the tables.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-11-11 Thread Amit Kapila

On Sat, Nov 11, 2017 at 5:05 PM, Robert Haas  wrote:
> On Mon, Sep 18, 2017 at 7:02 AM, Rushabh Lathia
>  wrote:
>>> In that case, can you please mark the patch [1] as ready for committer in
>>> CF app
>>
>> Done.
>
> I think this patch is mostly correct, but I think the change to
> planner.c isn't quite right.  ordered_rel->reltarget is just a dummy
> target list that produces nothing.  Instead, I think we should pass
> path->pathtarget, representing the idea that whatever Gather Merge
> produces as output is the same as what you put into it.
>

Agreed.  Your change looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-11-11 Thread Amit Kapila

On Sat, Nov 11, 2017 at 12:15 AM, Robert Haas  wrote:
> On Tue, Nov 7, 2017 at 4:45 AM, Amit Kapila  wrote:
>> As mentioned, changed the status of the patch in CF app.
>
> I spent some time reviewing this patch today and found myself still
> quite uncomfortable with the fact that it was adding execution-time
> work to track the types of parameters - types that would usually not
> even be used.  I found the changes to nodeNestLoop.c to be
> particularly objectionable, because we could end up doing the work
> over and over when it is actually not needed at all, or at most once.
>

That's right, but we are just accessing tuple descriptor to get the
type, there shouldn't be much work involved in that.  However, I think
your approach has a merit that we don't need to even do that during
execution time.

> I decided to try instead teaching the planner to keep track of the
> types of PARAM_EXEC parameters as they were created, and that seems to
> work fine.  See 0001, attached.
>

This looks good to me.

> 0002, attached, is my worked-over version of the rest of the patch.  I
> moved the code that serializes and deserializes PARAM_EXEC from
> nodeSubplan.c -- which seemed like a strange choice - to
> execParallel.c.
>

I have tried to follow the practice we have used for param extern
params (SerializeParamList is in params.c) and most of the handling of
initplans is done in nodeSubplan.c, so I choose to keep the newly
added functions there.  However, I think keeping it in execParallel.c
is also okay as we do it for serialize plan.

>  I removed the type OID from the serialization format
> because there's no reason to do that any more; the worker already
> knows the types from the plan.  I did some renaming of the functions
> involved and some adjustment of the comments to refer to "PARAM_EXEC
> parameters" instead of initPlan parameters, because there's no reason
> that I can see why this can only work for initPlans.  A Gather node on
> the inner side of a nested loop doesn't sound like a great idea, but I
> think this infrastructure could handle it (though it would need some
> more planner work).
>

I think it would need some work in execution as well because the size
won't be fixed in that case for varchar type of params.  We might end
up with something different as well.

>  I broke a lot of long lines in your version of
> the patch into multiple lines; please try to be attentive to this
> issue when writing patches in general, as it is a bit tedious to go
> through and insert line breaks in many places.
>

Okay, but I sometimes rely on pgindent for such things as for few
things it becomes difficult to decide which way it will be better.

> Please let me know your thoughts on the attached patches.
>

Few minor comments:
1.
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@

 #include "postgres.h"

+#include "executor/execExpr.h"
 #include "executor/execParallel.h"
 #include "executor/executor.h"
 #include "executor/nodeBitmapHeapscan.h"
@@ -31,6 +32,7 @@
 #include "executor/nodeIndexscan.h"
 #include "executor/nodeIndexonlyscan.h"
 #include "executor/nodeSeqscan.h"
+#include "executor/nodeSubplan.h"

This is not required if we move serialize and other functions to execParallel.c

2.
+set_param_references(PlannerInfo *root, Plan *plan)
+{
+ Assert(IsA(plan, Gather) ||IsA(plan, GatherMerge));

I think there should be a space after || operator.

3.
+/*
+ * Serialize ParamExecData params corresponding to initplans.
+ *
..
+/*
+ * Restore ParamExecData params corresponding to initplans.
+ */

Shouldn't we change the reference to initplans here as well?

I have fixed the first two in attached patch and left the last one as
I was not sure what you have in mind

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0002-pq-pushdown-initplan-rebased-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-10 Thread Amit Kapila

On Fri, Nov 10, 2017 at 9:48 AM, Amit Kapila  wrote:
> On Fri, Nov 10, 2017 at 8:36 AM, Robert Haas  wrote:
>> On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila  wrote:
>>> Have you set force_parallel_mode=regress; before running the
>>> statement?
>>
>> Yes, I tried that first.
>>
>>> If so, then why you need to tune other parallel query
>>> related parameters?
>>
>> Because I couldn't get it to break the other way, I then tried this.
>>
>> Instead of asking me what I did, can you tell me what I need to do?
>> Maybe a self-contained reproducible test case including exactly what
>> goes wrong on your end?
>>
>
> I think we are missing something very basic because you should see the
> failure by executing that statement in force_parallel_mode=regress
> even in a freshly created database.
>

I am seeing the assertion failure as below on executing the above
mentioned Create statement:

TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File:
"heapam.c", Line: 2634)
server closed the connection unexpectedly
This probably means the server terminated abnormally


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-09 Thread Amit Kapila

On Fri, Nov 10, 2017 at 8:36 AM, Robert Haas  wrote:
> On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila  wrote:
>> Have you set force_parallel_mode=regress; before running the
>> statement?
>
> Yes, I tried that first.
>
>> If so, then why you need to tune other parallel query
>> related parameters?
>
> Because I couldn't get it to break the other way, I then tried this.
>
> Instead of asking me what I did, can you tell me what I need to do?
> Maybe a self-contained reproducible test case including exactly what
> goes wrong on your end?
>

I think we are missing something very basic because you should see the
failure by executing that statement in force_parallel_mode=regress
even in a freshly created database.  I guess the missing point is that
I am using assertions enabled build and probably you are not (If this
is the reason, then it should have striked me first time).  Anyway, I
am writing steps to reproduce the issue.

1. initdb
2. start server
3. connect using psql
4. set force_parallel_mode=regress;
5. Create Table as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r';


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-09 Thread Amit Kapila

On Fri, Nov 10, 2017 at 12:05 AM, Robert Haas  wrote:
> On Thu, Nov 9, 2017 at 12:08 AM, Amit Kapila  wrote:
>> This change looks suspicious to me.  I think here we can't use the
>> tupDesc constructed from targetlist.  One problem, I could see is that
>> the check for hasOid setting in tlist_matches_tupdesc won't give the
>> correct answer.   In case of the scan, we use the tuple descriptor
>> stored in relation descriptor which will allow us to take the right
>> decision in tlist_matches_tupdesc.  If you try the statement CREATE
>> TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in
>> force_parallel_mode=regress, then you can reproduce the problem I am
>> trying to highlight.
>
> I tried this, but nothing seemed to be obviously broken.  Then I
> realized that the CREATE TABLE command wasn't using parallelism, so I
> retried with parallel_setup_cost = 0, parallel_tuple_cost = 0, and
> min_parallel_table_scan_size = 0.  That got it to use parallel query,
> but I still don't see anything broken.  Can you clarify further?
>

Have you set force_parallel_mode=regress; before running the
statement?  If so, then why you need to tune other parallel query
related parameters?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Runtime Partition Pruning

2017-11-09 Thread Amit Kapila

On Thu, Nov 9, 2017 at 9:01 PM, Robert Haas  wrote:
> On Thu, Nov 9, 2017 at 6:18 AM, Beena Emerson  wrote:
>> The code still chooses the custom plan instead of the generic plan for
>> the prepared statements. I am working on it.
>
> I don't think it's really the job of this patch to do anything about
> that problem.
>

+1.  I think if we really want to do something about plan choice when
partitions are involved that should be done as a separate patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-09 Thread Amit Kapila

On Wed, Nov 8, 2017 at 6:48 PM, Robert Haas  wrote:
> On Wed, Nov 8, 2017 at 7:26 AM, Amit Kapila  wrote:
>> We do want to generate it later when there isn't inheritance involved,
>> but only if there is a single rel involved (simple_rel_array_size
>> <=2).  The rule is something like this, we will generate the gather
>> paths at this stage only if there are more than two rels involved and
>> there isn't inheritance involved.
>
> Why is that the correct rule?
>
> Sorry if I'm being dense here.  I would have thought we'd want to skip
> it for the topmost scan/join rel regardless of the presence or absence
> of inheritance.
>

I think I understood your concern after some offlist discussion and it
is primarily due to the inheritance related check which can skip the
generation of gather paths when it shouldn't.  So what might fit
better here is a straight check on the number of base rels such that
allow generating gather path in set_rel_pathlist, if there are
multiple baserels involved.  I have used all_baserels which I think
will work better for this purpose.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

parallel_paths_include_tlist_cost_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-08 Thread Amit Kapila

On Wed, Nov 8, 2017 at 1:02 AM, Andres Freund  wrote:
> Hi,
>
> On 2017-11-06 10:56:43 +0530, Amit Kapila wrote:
>> On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund  wrote
>> > On 2017-11-05 01:05:59 +0100, Robert Haas wrote:
>> >> skip-gather-project-v1.patch does what it says on the tin.  I still
>> >> don't have a test case for this, and I didn't find that it helped very
>> >> much,
>>
>> I am also wondering in which case it can help and I can't think of the
>> case.
>
> I'm confused?  Isn't it fairly obvious that unnecessarily projecting
> at the gather node is wasteful? Obviously depending on the query you'll
> see smaller / bigger gains, but that there's beenfdits should be fairly 
> obvious?
>
>

I agree that there could be benefits depending on the statement.  I
initially thought that we are kind of re-evaluating the expressions in
target list as part of projection even if worker backend has already
done that, but that was not the case and instead, we are deforming the
tuples sent by workers.  Now, I think as a general principle it is a
good idea to delay the deforming as much as possible.

About the patch,

  /*
- * Initialize result tuple type and projection info.
- */
- ExecAssignResultTypeFromTL(&gatherstate->ps);
- ExecAssignProjectionInfo(&gatherstate->ps, NULL);
-

- /*
  * Initialize funnel slot to same tuple descriptor as outer plan.
  */
  if (!ExecContextForcesOids(&gatherstate->ps, &hasoid))
@@ -115,6 +109,12 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
  tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid);
  ExecSetSlotDescriptor(gatherstate->funnel_slot, tupDesc);

+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&gatherstate->ps);
+ ExecConditionalAssignProjectionInfo(&gatherstate->ps, tupDesc, OUTER_VAR);
+

This change looks suspicious to me.  I think here we can't use the
tupDesc constructed from targetlist.  One problem, I could see is that
the check for hasOid setting in tlist_matches_tupdesc won't give the
correct answer.   In case of the scan, we use the tuple descriptor
stored in relation descriptor which will allow us to take the right
decision in tlist_matches_tupdesc.  If you try the statement CREATE
TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in
force_parallel_mode=regress, then you can reproduce the problem I am
trying to highlight.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-08 Thread Amit Kapila

On Wed, Nov 8, 2017 at 4:34 PM, Robert Haas  wrote:
> On Tue, Nov 7, 2017 at 9:41 PM, Amit Kapila  wrote:
>> This is required to prohibit generating gather path for top rel in
>> case of inheritence (Append node) at this place (we want to generate
>> it later when scan/join target is available).
>
> OK, but why don't we want to generate it later when there isn't
> inheritance involved?
>

We do want to generate it later when there isn't inheritance involved,
but only if there is a single rel involved (simple_rel_array_size
<=2).  The rule is something like this, we will generate the gather
paths at this stage only if there are more than two rels involved and
there isn't inheritance involved.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-07 Thread Amit Kapila

On Wed, Nov 8, 2017 at 2:51 AM, Robert Haas  wrote:
> On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila  wrote:
>
>>>  Also, even if inheritance is used, we might still be the
>>> topmost scan/join target.
>>
>> Sure, but in that case, it won't generate the gather path here (due to
>> this part of check "!root->append_rel_list").  I am not sure whether I
>> have understood the second part of your question, so if my answer
>> appears inadequate, then can you provide more details on what you are
>> concerned about?
>
> I don't know why the question of why root->append_rel_list is empty is
> the relevant thing here for deciding whether to generate gather paths
> at this point.
>

This is required to prohibit generating gather path for top rel in
case of inheritence (Append node) at this place (we want to generate
it later when scan/join target is available).  For such a case, the
reloptkind will be RELOPT_BASEREL and simple_rel_array_size will be
greater than two as it includes child rels as well.  So, the check for
root->append_rel_list will prohibit generating gather path for such a
rel.  Now, for all the child rels of Append, the  reloptkind will be
RELOPT_OTHER_MEMBER_REL, so it won't generate gather path here because
of the first part of check (rel->reloptkind == RELOPT_BASEREL).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-11-07 Thread Amit Kapila

On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila  wrote:
> On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas  wrote:
>> On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila  wrote:
>>> How about always returning false for PARAM_EXTERN?
>>
>> Yeah, I think that's what we should do.  Let's do that first as a
>> separate patch, which we might even want to back-patch, then return to
>> this.
>>
>
> Now that the PARAM_EXTERN issue is fixed, I have rebased this patch.
> This patch had been switched to Ready For Committer in last CF, then
> Robert had comments which I have addressed, so I think the status
> should be switched back to Ready For committer.
>

As mentioned, changed the status of the patch in CF app.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-06 Thread Amit Kapila

On Mon, Nov 6, 2017 at 7:40 PM, Paul Ramsey  wrote:
> From my perspective, this is much much better. For sufficiently large
> tables, I get parallel behaviour without jimmying with the defaults on
> parallel_setup_cost and parallel_tuple_cost. *And*, the parallel behaviour
> *is* sensitive to the costs of functions in target lists, so reasonably
> chosen costs will flip us into a parallel mode for expensive functions
> against smaller tables too.
>

Thanks for the confirmation.

> Hopefully some variant of this finds it's way into core! Is there any way I
> can productively help?

You have already helped a lot by providing the use case, but feel free
to ping on that thread if you find it is not moving.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-06 Thread Amit Kapila

On Mon, Nov 6, 2017 at 7:05 PM, Robert Haas  wrote:
> On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila  wrote:
>> On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas  wrote:
>>> This looks like it's on the right track to me.  I hope Tom will look
>>> into it, but if he doesn't I may try to get it committed myself.
>>>
>>> -if (rel->reloptkind == RELOPT_BASEREL)
>>> -generate_gather_paths(root, rel);
>>> +if (rel->reloptkind == RELOPT_BASEREL &&
>>> +root->simple_rel_array_size > 2 &&
>>> +!root->append_rel_list)
>>>
>>> This test doesn't look correct to me.  Actually, it doesn't look
>>> anywhere close to correct to me.  So, one of us is very confused...
>>> not sure whether it's you or me.
>>>
>> It is quite possible that I haven't got it right, but it shouldn't be
>> completely bogus as it stands the regression tests and some manual
>> verification.  Can you explain what is your concern about this test?
>
> Well, I suppose that test will fire for a baserel when the total
> number of baserels is at least 3 and there's no inheritance involved.
> But if there are 2 baserels, we're still not the topmost scan/join
> target.
>

No, if there are 2 baserels, then simple_rel_array_size will be 3.
The simple_rel_array_size is always the "number of relations" plus
"one".  See setup_simple_rel_arrays.

>  Also, even if inheritance is used, we might still be the
> topmost scan/join target.
>

Sure, but in that case, it won't generate the gather path here (due to
this part of check "!root->append_rel_list").  I am not sure whether I
have understood the second part of your question, so if my answer
appears inadequate, then can you provide more details on what you are
concerned about?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-06 Thread Amit Kapila

On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas  wrote:
> On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila  wrote:
>> Thanks for the confirmation.  Find rebased patch attached.
>
> This looks like it's on the right track to me.  I hope Tom will look
> into it, but if he doesn't I may try to get it committed myself.
>
> -if (rel->reloptkind == RELOPT_BASEREL)
> -generate_gather_paths(root, rel);
> +if (rel->reloptkind == RELOPT_BASEREL &&
> +root->simple_rel_array_size > 2 &&
> +!root->append_rel_list)
>
> This test doesn't look correct to me.  Actually, it doesn't look
> anywhere close to correct to me.  So, one of us is very confused...
> not sure whether it's you or me.
>

It is quite possible that I haven't got it right, but it shouldn't be
completely bogus as it stands the regression tests and some manual
verification.  Can you explain what is your concern about this test?

>  simple_gather_path = (Path *)
>  create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
> NULL, NULL);
> +
> +/* Add projection step if needed */
> +if (target && simple_gather_path->pathtarget != target)
> +simple_gather_path = apply_projection_to_path(root, rel,
> simple_gather_path, target);
>
> Instead of using apply_projection_to_path, why not pass the correct
> reltarget to create_gather_path?
>

We need to push it to gather's subpath as is done in
apply_projection_to_path and then we have to cost it accordingly.  I
think if we don't use apply_projection_to_path  then we might end up
with much of the code similar to it in generate_gather_paths.  In
fact, I have tried something similar to what you are suggesting in the
first version of patch [1] and it didn't turn out to be clean.  Also,
I think we already do something similar in create_ordered_paths.

> +/* Set or update cheapest_total_path and related fields */
> +set_cheapest(current_rel);
>
> I wonder if it's really OK to call set_cheapest() a second time for
> the same relation...
>

I think if we want we can avoid it by checking whether we have
generated any gather path for the relation (basically, check if it has
partial path list).  Another idea could be that we consider the
generation of gather/gathermerge for top-level scan/join relation as a
separate step and generate a new kind of upper rel for it which will
be a mostly dummy but will have paths for gather/gathermerge.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JUvL9WS9z%3D5hjSuSMNCo3TdBxFa0pA%3DE__E%3Dp6iUffUQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-05 Thread Amit Kapila

On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund  wrote
> On 2017-11-05 01:05:59 +0100, Robert Haas wrote:
>> skip-gather-project-v1.patch does what it says on the tin.  I still
>> don't have a test case for this, and I didn't find that it helped very
>> much,

I am also wondering in which case it can help and I can't think of the
case.  Basically, as part of projection in the gather, I think we are
just deforming the tuple which we anyway need to perform before
sending the tuple to the client (printtup) or probably at the upper
level of the node.

>> and you said this looked like a big bottleneck in your
>> testing, so here you go.
>

Is it possible that it shows the bottleneck only for 'explain analyze'
statement as we don't deform the tuple for that at a later stage?

> The query where that showed a big benefit was
>
> SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1;
>
> (i.e a not very selective filter, and then just throwing the results away)
>
> still shows quite massive benefits:
>
> before:
> set parallel_setup_cost=0;set parallel_tuple_cost=0;set 
> min_parallel_table_scan_size=0;set max_parallel_workers_per_gather=8;
> tpch_5[17938][1]=# explain analyze SELECT * FROM lineitem WHERE l_suppkey > 
> '5012' OFFSET 10 LIMIT 1;
> ┌
> │ QUERY PLAN
> ├
> │ Limit  (cost=635802.67..635802.69 rows=1 width=127) (actual 
> time=8675.097..8675.097 rows=0 loops=1)
> │   ->  Gather  (cost=0.00..635802.67 rows=27003243 width=127) (actual 
> time=0.289..7904.849 rows=26989780 loops=1)
> │ Workers Planned: 8
> │ Workers Launched: 7
> │ ->  Parallel Seq Scan on lineitem  (cost=0.00..635802.67 
> rows=3375405 width=127) (actual time=0.124..528.667 rows=3373722 loops=8)
> │   Filter: (l_suppkey > 5012)
> │   Rows Removed by Filter: 376252
> │ Planning time: 0.098 ms
> │ Execution time: 8676.125 ms
> └
> (9 rows)
> after:
> tpch_5[19754][1]=# EXPLAIN ANALYZE SELECT * FROM lineitem WHERE l_suppkey > 
> '5012' OFFSET 10 LIMIT 1;
> ┌
> │ QUERY PLAN
> ├
> │ Limit  (cost=635802.67..635802.69 rows=1 width=127) (actual 
> time=5984.916..5984.916 rows=0 loops=1)
> │   ->  Gather  (cost=0.00..635802.67 rows=27003243 width=127) (actual 
> time=0.214..5123.238 rows=26989780 loops=1)
> │ Workers Planned: 8
> │ Workers Launched: 7
> │ ->  Parallel Seq Scan on lineitem  (cost=0.00..635802.67 
> rows=3375405 width=127) (actual time=0.025..649.887 rows=3373722 loops=8)
> │   Filter: (l_suppkey > 5012)
> │   Rows Removed by Filter: 376252
> │ Planning time: 0.076 ms
> │ Execution time: 5986.171 ms
> └────────
> (9 rows)
>
> so there clearly is still benefit (this is scale 100, but that shouldn't
> make much of a difference).
>

Do you see the benefit if the query is executed without using Explain Analyze?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-11-05 Thread Amit Kapila

On Sat, Nov 4, 2017 at 2:03 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila  wrote:
>>> On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane  wrote:
>>>> I've marked the CF entry closed.  However, I'm not sure if we're quite
>>>> done with this thread.  Weren't we going to adjust nbtree and hash to
>>>> be more aggressive about labeling their metapages as REGBUF_STANDARD?
>
>>> I have already posted the patches [1] for the same in this thread and
>>> those are reviewed [2][3] as well. I have adjusted the comments as per
>>> latest commit.   Please find updated patches attached.
>
>> Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set,
>> at least for page masking.
>
> Thanks, I'd forgotten those patches were already posted.  Looks good,
> so pushed.
>
> Looking around, I noted that contrib/bloom also had the disease of
> not telling log_newpage it was writing a standard-format metapage,
> so I fixed that too.
>

Thanks, Michael and Tom for reviewing and committing the work.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-04 Thread Amit Kapila

On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane  wrote:
> Paul Ramsey  writes:
>>> Whether I get a parallel aggregate seems entirely determined by the number
>>> of rows, not the cost of preparing those rows.
>
>> This is true, as far as I can tell and unfortunate. Feeding tables with
>> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
>> matter how costly the work going on within. That's true of changing costs
>> on the subquery select list, and on the aggregate transfn.
>
> This sounds like it might be the same issue being discussed in
>
> https://www.postgresql.org/message-id/flat/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com
>

I have rebased the patch being discussed on that thread.

Paul, you might want to once check with the recent patch [1] posted on
the thread mentioned by Tom.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-04 Thread Amit Kapila

On Thu, Sep 21, 2017 at 2:35 AM, Jeff Janes  wrote:
> On Tue, Sep 19, 2017 at 9:15 PM, Amit Kapila 
> wrote:
>>
>> On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes  wrote:
>> > On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro
>> >  wrote:
>> >>
>> >> On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila 
>> >> wrote:
>> >> > The attached patch fixes both the review comments as discussed above.
>> >
>> >
>> > that should be fixed by turning costs on the explain, as is the
>> > tradition.
>> >
>>
>> Right.  BTW, did you get a chance to run the original test (for which
>> you have reported the problem) with this patch?
>
>
> Yes, this patch makes it use a parallel scan, with great improvement.
>

Thanks for the confirmation.  Find rebased patch attached.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_paths_include_tlist_cost_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-11-03 Thread Amit Kapila

On Mon, Oct 30, 2017 at 10:07 AM, Robert Haas  wrote:
> On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila  wrote:
>> Now that the PARAM_EXTERN issue is fixed, I have rebased this patch.
>> This patch had been switched to Ready For Committer in last CF, then
>> Robert had comments which I have addressed, so I think the status
>> should be switched back to Ready For committer.  Let me know if you
>> think it should be switched to some other status.
>
> The change to ExplainPrintPlan doesn't look good to me, because it
> actually moves the initPlan; I don't think it's good for EXPLAIN to
> mutate the plan state tree.  It should find a way to display the
> results *as if* the initPlans were attached to the subnode, but
> without actually moving them.
>


Actually, with the latest patch, we don't need these changes in
ExplainPrintPlan.   Earlier, we need these changes because the patch
had changed SS_charge_for_initplans to mark the path with initplan as
parallel safe.  However, after removing that change in the previous
version of patch [1], this is not required as now we won't add gather
on top plan node having initplan.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1JD%3DpJYBn8rN5RimiEVtPJmVNmyq5p6VoZBnUw2xRYB7w%40mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-11-02 Thread Amit Kapila

On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane  wrote:
> Amit Langote  writes:
>> On 2017/09/26 16:30, Michael Paquier wrote:
>>> Cool, let's switch it back to a ready for committer status then.
>
>> Sure, thanks.
>
> Pushed with some cosmetic adjustments --- mostly, making the comments more
> explicit about why we need the apparently-redundant assignments to
> pd_lower.
>
> I've marked the CF entry closed.  However, I'm not sure if we're quite
> done with this thread.  Weren't we going to adjust nbtree and hash to
> be more aggressive about labeling their metapages as REGBUF_STANDARD?
>

I have already posted the patches [1] for the same in this thread and
those are reviewed [2][3] as well. I have adjusted the comments as per
latest commit.   Please find updated patches attached.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BTKg6ZK%3DmF14x_wf2KrmOxoMJ6z7YUK3-78acaYLwQ8Q%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAB7nPqTE4-GCaLtDh%3DJBcgUKR6B5WkvRLC-NpOqkgybi4FhHPw%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAB7nPqQBtDW43ABnWEdoHP6A2ToedzDFdpykbGjpO2wuZNiQnw%40mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


change_metapage_usage_btree-v3.patch
Description: Binary data


change_metapage_usage_hash-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel worker error

2017-10-30 Thread Amit Kapila

On Mon, Oct 30, 2017 at 10:04 AM, Robert Haas  wrote:
> On Mon, Oct 30, 2017 at 8:25 AM, Amit Kapila  wrote:
>> Thanks.  I have closed this entry in CF app, however, I am not sure
>> what is the best way to deal with the entry present in PostgreSQL 10
>> Open Items page[1].  The entry for this bug seems to be present in
>> Older Bugs section.  Shall we leave it as it is or do we want to do
>> something else with it?
>>
>> [1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items
>
> How about just adding an additional bullet point for that item that
> says "fixed by commit blah blah for 10.1"?
>

Sounds good, so updated the Open items page accordingly.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-29 Thread Amit Kapila

On Wed, Oct 18, 2017 at 2:06 PM, tushar  wrote:
> On 10/11/2017 12:42 PM, tushar wrote:
>>
>> On 10/09/2017 03:26 PM, Amit Kapila wrote:
>>>
>>> I have reverted the check
>>> in the attached patch.
>>
>>
>> I have applied this patch against PG HEAD and run sqlsmith and analyzed
>> results . didn't find any specific failures against this patch.
>>
> I did some testing of  this feature and written few testcases . PFA the sql
> file(along with the expected .out file) .
>

Thanks a lot Tushar for testing this patch.  In the latest patch, I
have just rebased some comments, there is no code change, so I don't
expect any change in behavior, but feel free to test it once again.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-29 Thread Amit Kapila

On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas  wrote:
> On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila  wrote:
>> How about always returning false for PARAM_EXTERN?
>
> Yeah, I think that's what we should do.  Let's do that first as a
> separate patch, which we might even want to back-patch, then return to
> this.
>

Now that the PARAM_EXTERN issue is fixed, I have rebased this patch.
This patch had been switched to Ready For Committer in last CF, then
Robert had comments which I have addressed, so I think the status
should be switched back to Ready For committer.  Let me know if you
think it should be switched to some other status.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pq_pushdown_initplan_v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel worker error

2017-10-29 Thread Amit Kapila

On Sun, Oct 29, 2017 at 1:15 PM, Robert Haas  wrote:
> On Sun, Oct 29, 2017 at 12:02 PM, Amit Kapila  wrote:
>> This patch no longer applies, so attached rebased patches.  I have
>> also created patches for v10 as we might want to backpatch the fix.
>> Added the patch in CF (https://commitfest.postgresql.org/15/1342/)
>
> Thanks.  I picked the second variant, committed, and also back-patched to 9.6.
>

Thanks.  I have closed this entry in CF app, however, I am not sure
what is the best way to deal with the entry present in PostgreSQL 10
Open Items page[1].  The entry for this bug seems to be present in
Older Bugs section.  Shall we leave it as it is or do we want to do
something else with it?


[1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

2017-10-29 Thread Amit Kapila

On Sun, Oct 29, 2017 at 9:55 PM, Robert Haas  wrote:
> On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila  wrote:
>> I think we need to make changes in exec_simple_recheck_plan to make
>> the behavior similar to HEAD.  With the attached patch, all tests
>> passed with force_parallel_mode.
>
> Committed to REL_10_STABLE.
>

Thanks, I have closed this entry in CF app.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel worker error

2017-10-28 Thread Amit Kapila

On Sat, Sep 9, 2017 at 7:56 AM, Amit Kapila  wrote:
> On Fri, Sep 8, 2017 at 3:13 PM, Robert Haas  wrote:
>> On Fri, Sep 8, 2017 at 1:17 AM, Amit Kapila  wrote:
>>> You are right.  I have changed the ordering and passed OuterUserId via
>>> FixedParallelState.
>>
>> This looks a little strange:
>>
>> +SetCurrentRoleId(fps->outer_user_id, fps->is_current_user_superuser);
>>
>> The first argument says "outer" but the second says "current".  I'm
>> wondering if we can just make the second one is_superuser.
>>
>
> No issues changed as per suggestion.
>
>> I'm also wondering if, rather than using GetConfigOptionByName, we
>> should just make the GUC underlying is_superuser non-static and use
>> the value directly.  If not, then I'm alternatively wondering whether
>> we should maybe use a less-generic name than varval.
>>
>
> I think we can go either way.  So prepared patches with both
> approaches. In fix_role_handling_parallel_worker_v3_1.patch, I have
> changed the variable name and in
> fix_role_handling_parallel_worker_v3_2.patch, I have exposed the guc)
> is_superuser.
>

This patch no longer applies, so attached rebased patches.  I have
also created patches for v10 as we might want to backpatch the fix.
Added the patch in CF (https://commitfest.postgresql.org/15/1342/)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_role_handling_parallel_worker_v4_1.patch
Description: Binary data


fix_role_handling_parallel_worker_v4_2.patch
Description: Binary data


fix_role_handling_parallel_worker_10_v4_1.patch
Description: Binary data


fix_role_handling_parallel_worker_10_v4_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

2017-10-28 Thread Amit Kapila

On Sat, Oct 28, 2017 at 2:02 AM, Robert Haas  wrote:
> On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila  wrote:
> When I tried back-porting the patch to v10 I discovered that the
> plpgsql changes conflict heavily and that ripping them out all the way
> produces regression failures under force_parallel_mode.  I think I see
> why that's happening but it's not obvious how to me how to adapt
> b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code.  Do you want
> to have a crack at it or should we just leave this as a master-only
> fix?
>

I think we need to make changes in exec_simple_recheck_plan to make
the behavior similar to HEAD.  With the attached patch, all tests
passed with force_parallel_mode.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_safety_for_extern_params_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Burst in WAL size when UUID is used as PK while full_page_writes are enabled

2017-10-27 Thread Amit Kapila

On Fri, Oct 27, 2017 at 5:46 PM, Thomas Kellerer  wrote:
> akapila wrote:
>
>> You might want to give a try with the hash index if you are planning
>> to use PG10 and your queries involve equality operations.
>
> But you can't replace the PK index with a hash index, because hash indexes
> don't support uniqueness.
>

That's true, but it hasn't been mentioned in the mail that the usage
of hash index is the for primary key.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Burst in WAL size when UUID is used as PK while full_page_writes are enabled

2017-10-27 Thread Amit Kapila

On Fri, Oct 27, 2017 at 5:36 PM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>
>> You might want to give a try with the hash index if you are planning
>> to use PG10 and your queries involve equality operations.
>
> So, btree indexes on monotonically increasing sequences don't write tons
> of full page writes because typically the same page is touched many
> times by insertions on each checkpoint cycle; so only one or very few
> full page writes are generated for a limited number of index pages.
>
> With UUID you lose locality of access: each insert goes to a different
> btree page, so you generate tons of full page writes because the number
> of modified index pages is very large.
>
> With hash on monotonically increasing keys, my guess is that you get
> behavior similar to btrees on UUID: the inserts are all over the place
> in the index, so tons of full page writes.  Am I wrong?
>
> With hash on UUID, the same thing should happen.  Am I wrong?
>

If the bucket pages are decided merely based on hashkey, then what you
are saying should be right.  However, we mask the hash key with
high|low mask due to which it falls in one of existing page in the
hash index.  Also, I have suggested based on some of the tests we have
done on UUID column and the result was that most of the time hash
index size was lesser than btree size.  See pages 15-17 of hash index
presentation in the last PGCon [1].

[1] - 
https://www.pgcon.org/2017/schedule/attachments/458_durable-hash-indexes-postgresql.pdf

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-10-27 Thread Amit Kapila

On Tue, Sep 19, 2017 at 8:47 AM, Amit Kapila  wrote:
> On Mon, Sep 18, 2017 at 10:00 PM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> Attached patch fixes these problems.
>>
>> Hmm, this patch adds a kill(notify_pid) after one call to
>> ForgetBackgroundWorker, but the postmaster has several more such calls.
>> Shouldn't they all notify the notify_pid?  Should we move that
>> functionality into ForgetBackgroundWorker itself, so we can't forget
>> it again?
>>
>
> Among other places, we already notify during
> ReportBackgroundWorkerExit().  However, it seems to me that all other
> places except where this patch has added a call to notify doesn't need
> such a call.  The other places like in DetermineSleepTime and
> ResetBackgroundWorkerCrashTimes is called for a crashed worker which I
> don't think requires notification to the backend as that backend
> itself would have restarted.  The other place where we call
> ForgetBackgroundWorker is in maybe_start_bgworkers when rw_terminate
> is set to true which again seems to be either the case of worker crash
> or when someone has explicitly asked to terminate the worker in which
> case we already send a notification.
>
> I think we need to notify the backend on start, stop and failure to
> start a worker.  OTOH, if it is harmless to send a notification even
> after the worker is crashed, then we can just move that functionality
> into ForgetBackgroundWorker itself as that will simplify the code and
> infact that is the first thing that occurred to me as well, but I
> haven't done that way as I was not sure if we want to send
> notification in all kind of cases.
>

The patch still applies (with some hunks).  I have added it in CF [1]
to avoid losing track.

[1] - https://commitfest.postgresql.org/15/1341/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Burst in WAL size when UUID is used as PK while full_page_writes are enabled

2017-10-27 Thread Amit Kapila

On Fri, Oct 27, 2017 at 11:26 AM, sanyam jain  wrote:
> Hi,
>
> I was reading the blog
> https://blog.2ndquadrant.com/on-the-impact-of-full-page-writes .
>
> My queries:
>
> How randomness of UUID will likely to create new leaf page in btree index?
> In my understanding as the size of UUID is 128 bits i.e. twice of BIGSERIAL
> , more number of pages will be required to store the same number of rows and
> hence there can be increase in WAL size due to FPW .
> When compared the index size in local setup UUID index is ~2x greater in
> size.
>

You might want to give a try with the hash index if you are planning
to use PG10 and your queries involve equality operations.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

2017-10-27 Thread Amit Kapila

On Mon, Oct 16, 2017 at 4:29 PM, Amit Kapila  wrote:
> On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas  wrote:
>
>> I think the bug is in ExecGather(Merge): it assumes that if we're in
>> parallel mode, it's OK to start workers.  But actually, it shouldn't
>> do this unless ExecutePlan ended up with use_parallel_mode == true,
>> which isn't quite the same thing.  I think we might need ExecutePlan
>> to set a flag in the estate that ExecGather(Merge) can check.
>>
>
> Attached patch fixes the problem in a suggested way.
>
>

All the patches still apply and pass the regression test.  I have
added this to CF to avoid losing the track of this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CurTransactionContext freed before transaction COMMIT ???

2017-10-26 Thread Amit Kapila

On Wed, Oct 25, 2017 at 8:04 PM, Gaddam Sai Ram
 wrote:
> Thanks for the response,
>
> Can you check if CurTransactionContext is valid at that point?
>
>
> Any Postgres function to check if CurTransactionContext is valid or not?
>
> To see, if this problem is related to CurTransactionContext, can you try to
> populate the list in TopMemoryContext and see if that works.
>
>
> Did you mean TopTransactionContext?
>

No, I mean what I have written.  I suspect in your case
TopTransactionContext will be same as CurTransactionContext because
you don't have any subtransaction.

> As of now, we don't free our dlist. We solely depend on Postgres to free our
> dlist while it frees the TopTransactionContext. But if we do allocate in
> TopMemoryContext, we need to take care of freeing our allocations.
>

Can't we do it temporarily to test?  I am not suggesting to make this
a permanent change rather a way to see the reason of the problem.

> And one more issue is, we found this bug once in all the testing we did. So
> trying to replay this bug seems very difficult.
>

Oh, then it is tricky.  I think there is a good chance that this is
some of your application issues where you probably haven't used memory
context as required, so probably you need to figure out a way to
reproduce this issue, otherwise, it might be difficult to track down
the actual cause.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CurTransactionContext freed before transaction COMMIT ???

2017-10-25 Thread Amit Kapila

On Tue, Oct 24, 2017 at 7:42 PM, Gaddam Sai Ram  wrote:

>
> Hello people,
>
> We are implementing in-memory index. As a part of that, during
> index callbacks, under CurTransactionContext, we cache all the DMLs of a
> transaction in dlist(postgres's doubly linked list).
> We registered transaction callback, and in transaction
> pre-commit event(XACT_EVENT_PRE_COMMIT), we iterate through all cached
> DMLs(dlist) and populate in my in-memory data structure.
>
>--> For detailed explanation of our implementation, please look
> into below thread.
>https://www.postgresql.org/message-id/15f4ea99b34.
> e69a4e1b633.8937474039794097334%40zohocorp.com
>
>--> It was working fine until I was greeted with a segmentation
> fault while accessing dlist!
>
>--> On further examining I found that dlist head is not NULL
> and it is pointing to some address, but the container I extracted is
> pointing to 0x7f7f7f7f7f7f7f5f, and I was not able to access any members in
> my container.
>
>--> https://wiki.postgresql.org/wiki/Developer_FAQ#Why_
> are_my_variables_full_of_0x7f_bytes.3F
> From the above wiki, we came to a conclusion that the memory
> context in which that dlist was present was freed.
> And yes CLOBBER_FREED_MEMORY is enabled by passing
> --enable-cassert to enable asserts in my code.
>
>--> Normally the memory context inside XACT_EVENT_PRE_COMMIT is
> TopTransactionContext but in this particular case, the memory context was
> 'MessageContext'. Little unusual! Not sure why!
>
>--> So basically CurTransactionContext shouldn't get freed
> before transaction COMMIT.
>--> So what has actually happened here??? Kindly help us with
> your insights!
>
>
Can you check if CurTransactionContext is valid at that point?  To see, if
this problem is related to CurTransactionContext, can you try to populate
the list in TopMemoryContext and see if that works.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Pluggable storage

2017-10-25 Thread Amit Kapila

On Wed, Oct 25, 2017 at 11:37 AM, Robert Haas  wrote:
> On Fri, Oct 20, 2017 at 5:47 AM, Amit Kapila  wrote:
>> I think what we need here is a way to register satisfies function
>> (SnapshotSatisfiesFunc) in SnapshotData for different storage engines.
>
> I don't see how that helps very much.  SnapshotSatisfiesFunc takes a
> HeapTuple as an argument, and it cares in detail about that tuple's
> xmin, xmax, and infomask, and it sets hint bits.  All of that is bad,
> because an alternative storage engine is likely to use a different
> format than HeapTuple and to not have hint bits (or at least not in
> the same form we have them now).  Also, it doesn't necessarily have a
> Boolean answer to the question "can this snapshot see this tuple?".
> It may be more like "given this TID, what tuple if any can I see
> there?" or "given this tuple, what version of it would I see with this
> snapshot?".
>
> Another thing to consider is that, if we could replace satisfiesfunc,
> it would probably break some existing code.  There are multiple places
> in the code that compare snapshot->satisfies to
> HeapTupleSatisfiesHistoricMVCC and HeapTupleSatisfiesMVCC.
>
> I think the storage API should just leave snapshots alone.  If a
> storage engine wants to call HeapTupleSatisfiesVisibility() with that
> snapshot, it can do so.  Otherwise it can switch on
> snapshot->satisfies and handle each case however it likes.
>

How will it switch satisfies at runtime?  There are places where we
might know which visibility function (*MVCC , *Dirty, etc) needs to be
called, but I think there are other places (like heap_fetch) where it
is not clear and we decide based on what is stored in
snapshot->satisfies.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended

2017-10-23 Thread Amit Kapila

On Tue, Oct 17, 2017 at 7:53 AM, Michael Paquier
 wrote:
> On Mon, Oct 16, 2017 at 9:50 PM, Amit Kapila  wrote:
>> If above analysis is correct, then I think we can say that row state
>> in a page will be same during recovery as it was when the original
>> operation was performed if the full_page_writes are enabled. I am not
>> sure how much this can help in current heap format, but this can help
>> in zheap (undo based heap).
>
> If I understood that correctly, that looks like a sane assumption. For
> REGBUF_NO_IMAGE you may need to be careful though with undo
> operations.
>

Right, but as of now, we don't need to use this assumption with REGBUF_NO_IMAGE.

>> In zheap, we are writing complete tuple for Delete operation in undo
>> so that we can reclaim the corresponding tuple space as soon as the
>> deleting transaction is committed.  Now, during recovery, we have to
>> generate the complete undo record (which includes the entire tuple)
>> and for that ideally, we should write the complete tuple in WAL, but
>> instead of that, I think we can regenerate it from the original page.
>> This is only applicable when full_page_writes are enabled, otherwise,
>> a complete tuple is required in WAL.
>
> Yeah, you should really try to support both modes as well.
>

I also think so.

> Fortunately, it is possible to know if full page writes are enforced
> at the moment a record is assembled and inserted, so you could rely on
> that.
>

Yeah, but actually we need to know whether full page writes are
enforced while forming the record (something like in log_heap_update).
Now, ideally to read the flags XLogCtlInsert->
fullPageWrites or XLogCtlInsert->forcePageWrites, we need an insertion
lock as we do in XLogInsertRecord.  However, I think we don't need an
insertion lock to read the values for this purpose, rather we can use
GetFullPageWriteInfo which doesn't need a lock.  The reason is that if
the value of doPageWrites is true while forming and assembling the WAL
records, then we will include the copy of page even if the value
changes in XLogInsertRecord.   OTOH, if it is false while forming and
assembling the WAL records, then we would have to anyway include undo
tuple in the WAL record which will avoid the dependency on
full_page_image, so even if doPageWrites changes to true in
XLogInsertRecord, we don't need to worry.

>> I am not sure how much above makes sense to anyone without a detailed
>> explanation, but I thought I should give some background on why I
>> asked this question.  However, if anybody needs more explanation or
>> sees any fault in above understanding, please let me know.
>
> Thanks for clarifying, I was wondering the reason behind the question
> as well. It is the second time that I see the word zheap on -hackers,
> and the first time was no longer than 2 days ago by Robert.
>

This is a big undertaking and will take time to reach a stage where
the whole project can be shared, but some of the important design
points which are quite linked with existing technology can be
discussed earlier to avoid making wrong assumptions.

Thanks for having a discussion on this topic.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-23 Thread Amit Kapila

On Thu, Oct 19, 2017 at 1:16 AM, Robert Haas  wrote:
> On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund  wrote:
>
>>b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
>>   queue whenever it's not empty when a tuple is ready.
>
> Batching them with what?  I hate to postpone sending tuples we've got;
> that sounds nice in the case where we're sending tons of tuples at
> high speed, but there might be other cases where it makes the leader
> wait.
>

I think Rafia's latest patch on the thread [1] has done something
similar where the tuples are sent till there is a space in shared
memory queue and then turn to batching the tuples using local queues.


[1] - 
https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-23 Thread Amit Kapila

On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund  wrote:
> Hi Everyone,
>
> On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:
>> While analysing the performance of TPC-H queries for the newly developed
>> parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
>> that the time taken by gather node is significant. On investigation, as per
>> the current method it copies each tuple to the shared queue and notifies
>> the receiver. Since, this copying is done in shared queue, a lot of locking
>> and latching overhead is there.
>>
>> So, in this POC patch I tried to copy all the tuples in a local queue thus
>> avoiding all the locks and latches. Once, the local queue is filled as per
>> it's capacity, tuples are transferred to the shared queue. Once, all the
>> tuples are transferred the receiver is sent the notification about the same.
>>
>> With this patch I could see significant improvement in performance for
>> simple queries,
>
> I've spent some time looking into this, and I'm not quite convinced this
> is the right approach.
>

As per my understanding, the patch in this thread is dead (not
required) after the patch posted by Rafia in thread "Effect of
changing the value for PARALLEL_TUPLE_QUEUE_SIZE" [1].  There seem to
be two related problems in this area, first is shm queue communication
overhead and second is workers started to stall when shm queue gets
full.  We can observe the first problem in simple queries that use
gather and second in gather merge kind of scenarios.  We are trying to
resolve both the problems with the patch posted in another thread.  I
think there is some similarity with the patch posted on this thread,
but it is different.  I have mentioned something similar up thread as
well.


>  Let me start by describing where I see the
> current performance problems around gather stemming from.
>
> The observations here are made using
> select * from t where i < 3000 offset 2999 - 1;
> with Rafia's data. That avoids slowdowns on the leader due to too many
> rows printed out. Sometimes I'll also use
> SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1;
> on tpch data to show the effects on wider tables.
>
> The precise query doesn't really matter, the observations here are more
> general, I hope.
>
> 1) nodeGather.c re-projects every row from workers. As far as I can tell
>that's now always exactly the same targetlist as it's coming from the
>worker. Projection capability was added in 8538a6307049590 (without
>checking whether it's needed afaict), but I think it in turn often
>obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7.  My
>measurement shows that removing the projection yields quite massive
>speedups in queries like yours, which is not too surprising.
>
>I suspect this just needs a tlist_matches_tupdesc check + an if
>around ExecProject(). And a test, right now tests pass unless
>force_parallel_mode is used even if just commenting out the
>projection unconditionally.
>

+1.  I think we should something to avoid this.

>
>Commenting out the memory barrier - which is NOT CORRECT - improves
>timing:
>before: 4675.626
>after: 4125.587
>
>The correct fix obviously is not to break latch correctness. I think
>the big problem here is that we perform a SetLatch() for every read
>from the latch.
>
>I think we should
>a) introduce a batch variant for receiving like:
>
> extern shm_mq_result shm_mq_receivev(shm_mq_handle *mqh,
>  shm_mq_iovec *iov, int *iovcnt,
>  bool nowait)
>
>   which then only does the SetLatch() at the end. And maybe if it
>   wrapped around.
>
>b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
>   queue whenever it's not empty when a tuple is ready.
>

I think the patch posted in another thread has tried to achieve such a
batching by using local queues, maybe we should try some other way.

>
> Unfortunately the patch's "local worker queue" concept seems, to me,
> like it's not quite addressing the structural issues, instead opting to
> address them by adding another layer of queuing.
>

That is done to use batching the tuples while sending them.  Sure, we
can do some of the other things as well, but I think the main
advantage is from batching the tuples in a smart way while sending
them and once that is done, we might not need many of the other
optimizations.


[1] - 
https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-10-19 Thread Amit Kapila

On Sat, Oct 14, 2017 at 1:09 AM, Alexander Korotkov
 wrote:
> On Fri, Oct 13, 2017 at 9:41 PM, Robert Haas  wrote:
>>
>> On Fri, Oct 13, 2017 at 1:59 PM, Peter Geoghegan  wrote:
>> >> Fully agreed.
>> >
>> > If we implement that interface, where does that leave EvalPlanQual()?
>
>
> From the first glance, it seems that pluggable storage should override
> EvalPlanQualFetch(), rest of EvalPlanQual() looks quite generic.
>

I think there is more to it.  Currently, EState->es_epqTuple is a
HeapTuple which is filled as part of EvalPlanQual mechanism and then
later used during the scan.  We need to make it pluggable in some way
so that other heaps can work.  We also need some work for
EvalPlanQualFetchRowMarks as that also seems to be tightly coupled
with HeapTuple.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-10-19 Thread Amit Kapila

On Fri, Oct 13, 2017 at 1:58 PM, Haribabu Kommi
 wrote:
>
> On Fri, Oct 13, 2017 at 11:55 AM, Robert Haas  wrote:
>>
>> On Thu, Oct 12, 2017 at 8:00 PM, Haribabu Kommi
>>  wrote:
>> > Currently I added a snapshot_satisfies API to find out whether the tuple
>> > satisfies the visibility or not with different types of visibility
>> > routines.
>> > I feel these
>> > are some how enough to develop a different storage methods like UNDO.
>> > The storage methods can decide internally how to provide the visibility.
>> >
>> > + amroutine->snapshot_satisfies[MVCC_VISIBILITY] =
>> > HeapTupleSatisfiesMVCC;
>> > + amroutine->snapshot_satisfies[SELF_VISIBILITY] =
>> > HeapTupleSatisfiesSelf;
>> > + amroutine->snapshot_satisfies[ANY_VISIBILITY] = HeapTupleSatisfiesAny;
>> > + amroutine->snapshot_satisfies[TOAST_VISIBILITY] =
>> > HeapTupleSatisfiesToast;
>> > + amroutine->snapshot_satisfies[DIRTY_VISIBILITY] =
>> > HeapTupleSatisfiesDirty;
>> > + amroutine->snapshot_satisfies[HISTORIC_MVCC_VISIBILITY] =
>> > HeapTupleSatisfiesHistoricMVCC;
>> > + amroutine->snapshot_satisfies[NON_VACUUMABLE_VISIBILTY] =
>> > HeapTupleSatisfiesNonVacuumable;
>> > +
>> > + amroutine->snapshot_satisfiesUpdate = HeapTupleSatisfiesUpdate;
>> > + amroutine->snapshot_satisfiesVacuum = HeapTupleSatisfiesVacuum;
>> >
>> > Currently no changes are carried out in snapshot logic as that is kept
>> > seperate
>> > from storage API.
>>
>> That seems like a strange choice of API.  I think it should more
>> integrated with the scan logic.  For example, if I'm doing an index
>> scan, and I get a TID, then I should be able to just say "here's a
>> TID, give me any tuples associated with that TID that are visible to
>> the scan snapshot".  Then for the current heap it will do
>> heap_hot_search_buffer, and for zheap it will walk the undo chain and
>> return the relevant tuple from the chain.
>
>
> OK, Understood.
> I will check along these lines and come up with storage API's.
>

I think what we need here is a way to register satisfies function
(SnapshotSatisfiesFunc) in SnapshotData for different storage engines.
That is the core API to decide visibility with respect to different
storage engines.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended

2017-10-16 Thread Amit Kapila

On Fri, Oct 13, 2017 at 11:57 AM, Amit Kapila  wrote:
> On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier
>  wrote:
>> On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila  
>> wrote:
>>> Today, I was trying to think about cases when we can return BLK_DONE
>>> in XLogReadBufferForRedoExtended.  One thing that occurred to me is
>>> that it can happen during the replay of WAL if the full_page_writes is
>>> off.  Basically, when the full_page_writes is on, then during crash
>>> recovery, it will always first restore the full page image of page and
>>> then apply the WAL corresponding to that page, so we will never hit
>>> the case where the lsn of the page is greater than lsn of WAL record.
>>>
>>> Are there other cases for which we can get BLK_DONE state?  Is it
>>> mentioned somewhere in code/comments which I am missing?
>>
>> Remember the thread about meta page optimization... Some index AMs use
>> XLogInitBufferForRedo() to redo their meta pages and those don't have
>> a FPW so when doing crash recovery you may finish by not replaying a
>> meta page if its LSN on the page header is newer than what's being
>> replayed.
>>
>
> I think for metapage usage, it will anyway apply the changes.   See
> _bt_restore_page.
>
>> That's another case where BLK_DONE can be reached, even if
>> full_page_writes is on.
>>
>
> Yeah and probably if someone uses REGBUF_NO_IMAGE.  However, I was
> mainly thinking about cases where caller is not doing anything to
> prevent full_page_image explicitly.
>
>

If above analysis is correct, then I think we can say that row state
in a page will be same during recovery as it was when the original
operation was performed if the full_page_writes are enabled. I am not
sure how much this can help in current heap format, but this can help
in zheap (undo based heap).

In zheap, we are writing complete tuple for Delete operation in undo
so that we can reclaim the corresponding tuple space as soon as the
deleting transaction is committed.  Now, during recovery, we have to
generate the complete undo record (which includes the entire tuple)
and for that ideally, we should write the complete tuple in WAL, but
instead of that, I think we can regenerate it from the original page.
This is only applicable when full_page_writes are enabled, otherwise,
a complete tuple is required in WAL.

I am not sure how much above makes sense to anyone without a detailed
explanation, but I thought I should give some background on why I
asked this question.  However, if anybody needs more explanation or
sees any fault in above understanding, please let me know.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-10-16 Thread Amit Kapila

On Sat, Oct 14, 2017 at 11:44 PM, Tomas Vondra
 wrote:
> Hi,
>
>
> I propose is to add a new cursor option (PARALLEL), which would allow
> parallel plans for that particular user-defined cursor. Attached is an
> experimental patch doing this (I'm sure there are some loose ends).
>

How will this work for backward scans?


>
> Regarding (2), if the user suspends the cursor for a long time, bummer.
> The parallel workers will remain assigned, doing nothing. I don't have
> any idea how to get around that, but I don't see how we could do better.
>

One idea could be that whenever someone uses Parallel option, we can
fetch and store all the data locally before returning control to the
user (something like WITH HOLD option).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

2017-10-16 Thread Amit Kapila

On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas  wrote:
> On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila  wrote:
>> After fixing this problem, when I ran the regression tests with
>> force_parallel_mode = regress, I saw multiple other failures.  All the
>> failures boil down to two kinds of cases:
>>
>> 1. There was an assumption while extracting simple expressions that
>> the target list of gather node can contain constants or Var's.  Now,
>> once the above patch allows extern params as parallel-safe, that
>> assumption no longer holds true.  We need to handle params as well.
>> Attached patch fix_simple_expr_interaction_gather_v1.patch handles
>> that case.
>
> - * referencing the child node's output ... but setrefs.c might also have
> - * copied a Const as-is.
> + * referencing the child node's output or a Param... but setrefs.c might
> + * also have copied a Const as-is.
>
> I think the Param case should be mentioned after "... but" not before
> - i.e. referencing the child node's output... but setrefs.c might also
> have copied a Const or Param is-is.
>

I am not sure if we can write the comment like that (.. copied a Const
or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
special handling for Var and Param where constants are copied as-is
via expression_tree_mutator.  Also as proposed, the handling for
params is more like Var in exec_save_simple_expr.

>> 2. We don't allow execution to use parallelism if the plan can be
>> executed multiple times.  This has been enforced in ExecutePlan, but
>> it seems like that we miss to handle the case where we are already in
>> parallel mode by the time we enforce that condition.  So, what
>> happens, as a result, is that it will allow to use parallelism when it
>> shouldn't (when the same plan is executed multiple times) and lead to
>> a crash.  One way to fix is that we temporarily exit the parallel mode
>> in such cases and reenter in the same state once the current execution
>> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
>> fixes this problem.
>
> This seems completely unsafe.  If somebody's already entered parallel
> mode, they are counting on having the parallel-mode restrictions
> enforced until they exit parallel mode.  We can't just disable those
> restrictions for a while in the middle and then put them back.
>

Right.

> I think the bug is in ExecGather(Merge): it assumes that if we're in
> parallel mode, it's OK to start workers.  But actually, it shouldn't
> do this unless ExecutePlan ended up with use_parallel_mode == true,
> which isn't quite the same thing.  I think we might need ExecutePlan
> to set a flag in the estate that ExecGather(Merge) can check.
>

Attached patch fixes the problem in a suggested way.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_mode_nested_execution_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended

2017-10-12 Thread Amit Kapila

On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier
 wrote:
> On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila  wrote:
>> Today, I was trying to think about cases when we can return BLK_DONE
>> in XLogReadBufferForRedoExtended.  One thing that occurred to me is
>> that it can happen during the replay of WAL if the full_page_writes is
>> off.  Basically, when the full_page_writes is on, then during crash
>> recovery, it will always first restore the full page image of page and
>> then apply the WAL corresponding to that page, so we will never hit
>> the case where the lsn of the page is greater than lsn of WAL record.
>>
>> Are there other cases for which we can get BLK_DONE state?  Is it
>> mentioned somewhere in code/comments which I am missing?
>
> Remember the thread about meta page optimization... Some index AMs use
> XLogInitBufferForRedo() to redo their meta pages and those don't have
> a FPW so when doing crash recovery you may finish by not replaying a
> meta page if its LSN on the page header is newer than what's being
> replayed.
>

I think for metapage usage, it will anyway apply the changes.   See
_bt_restore_page.

> That's another case where BLK_DONE can be reached, even if
> full_page_writes is on.
>

Yeah and probably if someone uses REGBUF_NO_IMAGE.  However, I was
mainly thinking about cases where caller is not doing anything to
prevent full_page_image explicitly.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-12 Thread Amit Kapila

On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas  wrote:
> On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila  wrote:
>> How about always returning false for PARAM_EXTERN?
>
> Yeah, I think that's what we should do.  Let's do that first as a
> separate patch, which we might even want to back-patch, then return to
> this.
>

Sure, I have started a new thread as this fix lead to some other
failures.  I hope that is okay.

https://www.postgresql.org/message-id/CAA4eK1%2B_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel safety for extern params

2017-10-12 Thread Amit Kapila

As discussed in a nearby thread [1] (parallelize queries containing
initplans), it appears that there are cases where queries referring
PARAM_EXTERN params are treated as parallel-restricted even though
they should be parallel-safe.  I have done some further investigation
and found that actually for most of the cases they are treated as
parallel-restricted (due to tests in max_parallel_hazard_walker).  The
cases where such queries are treated as parallel-safe are when
eval_const_expressions_mutator converts the param to const.  So
whenever we select the generic plan, it won't work.  One simple
example is:

create table t1(c1 int, c2 char(500));
insert into t1 values(generate_series(1,1),'aaa');
analyze t1;

set force_parallel_mode = on;
regression=# prepare t1_select(int) as Select c1 from t1 where c1 < $1;
PREPARE
regression=# explain (costs off, analyze) execute t1_select(1);
QUERY PLAN
---
 Gather (actual time=35.252..44.727 rows= loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Seq Scan on t1 (actual time=0.364..5.638 rows= loops=1)
 Filter: (c1 < 1)
 Rows Removed by Filter: 1
 Planning time: 2135.514 ms
 Execution time: 52.886 ms
(9 rows)

The next four executions will give the same plan, however, the sixth
execution will give below plan:
regression=# explain (costs off, analyze) execute t1_select(10005);
  QUERY PLAN
--
 Seq Scan on t1 (actual time=0.049..6.188 rows=1 loops=1)
   Filter: (c1 < $1)
 Planning time: 22577.404 ms
 Execution time: 7.612 ms
(4 rows)

Attached patch fix_parallel_safety_for_extern_params_v1.patch fixes
this problem.  Note, for now, I have added a very simple test in
regression tests to cover prepared statement case, but we might want
to add something related to generic plan selection as I have shown
above. I am just not sure if it is a good idea to have multiple
executions just to get the generic plan.

After fixing this problem, when I ran the regression tests with
force_parallel_mode = regress, I saw multiple other failures.  All the
failures boil down to two kinds of cases:

1. There was an assumption while extracting simple expressions that
the target list of gather node can contain constants or Var's.  Now,
once the above patch allows extern params as parallel-safe, that
assumption no longer holds true.  We need to handle params as well.
Attached patch fix_simple_expr_interaction_gather_v1.patch handles
that case.

2. We don't allow execution to use parallelism if the plan can be
executed multiple times.  This has been enforced in ExecutePlan, but
it seems like that we miss to handle the case where we are already in
parallel mode by the time we enforce that condition.  So, what
happens, as a result, is that it will allow to use parallelism when it
shouldn't (when the same plan is executed multiple times) and lead to
a crash.  One way to fix is that we temporarily exit the parallel mode
in such cases and reenter in the same state once the current execution
is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
fixes this problem.

Thanks to Kuntal who has helped in investigating the regression
failures which happened as a result of making param extern params as
parallel-safe.



[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobSN66o2_c76rY12JvS_sZa17zpKvpuyG-QzRvVLYE8Vg%40mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_safety_for_extern_params_v1.patch
Description: Binary data


fix_simple_expr_interaction_gather_v1.patch
Description: Binary data


fix_parallel_mode_nested_execution_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended

2017-10-12 Thread Amit Kapila

Today, I was trying to think about cases when we can return BLK_DONE
in XLogReadBufferForRedoExtended.  One thing that occurred to me is
that it can happen during the replay of WAL if the full_page_writes is
off.  Basically, when the full_page_writes is on, then during crash
recovery, it will always first restore the full page image of page and
then apply the WAL corresponding to that page, so we will never hit
the case where the lsn of the page is greater than lsn of WAL record.

Are there other cases for which we can get BLK_DONE state?  Is it
mentioned somewhere in code/comments which I am missing?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-09 Thread Amit Kapila

On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar  wrote:
> On 6 October 2017 at 08:49, Amit Kapila  wrote:
>>
>> Okay, but why not cheapest partial path?
>
> I gave some thought on this point. Overall I feel it does not matter
> which partial path it should pick up. RIght now the partial paths are
> not ordered. But for non-partial paths sake, we are just choosing the
> very last path. So in case of mixed paths, leader will get a partial
> path, but that partial path would not be the cheapest path. But if we
> also order the partial paths, the same logic would then pick up
> cheapest partial path. The question is, should we also order the
> partial paths for the leader ?
>
> The only scenario I see where leader choosing cheapest partial path
> *might* show some benefit, is if there are some partial paths that
> need to do some startup work using only one worker. I think currently,
> parallel hash join is one case where it builds the hash table, but I
> guess here also, we support parallel hash build, but not sure about
> the status.
>

You also need to consider how merge join is currently work in parallel
(each worker need to perform the whole of work of right side).  I
think there could be more scenario's where the startup cost is high
and parallel worker needs to do that work independently.

 For such plan, if leader starts it, it would be slow, and
> no other worker would be able to help it, so its actual startup cost
> would be drastically increased. (Another path is parallel bitmap heap
> scan where the leader has to do something and the other workers wait.
> But here, I think it's not much work for the leader to do). So
> overall, to handle such cases, it's better for leader to choose a
> cheapest path, or may be, a path with cheapest startup cost. We can
> also consider sorting partial paths with decreasing startup cost.
>

Yeah, that sounds reasonable.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-09 Thread Amit Kapila

On Sat, Oct 7, 2017 at 7:22 PM, Robert Haas  wrote:
> On Fri, Oct 6, 2017 at 7:08 AM, Amit Kapila  wrote:
>> I have fixed the other review comment related to using safe_param_list
>> in the attached patch.  I think I have fixed all comments given by
>> you, but let me know if I have missed anything or you have any other
>> comment.
>
> -Param   *param = (Param *) node;
> +if (list_member_int(context->safe_param_ids, ((Param *)
> node)->paramid))
> +return false;
>
> -if (param->paramkind != PARAM_EXEC ||
> -!list_member_int(context->safe_param_ids, param->paramid))
> -{
> -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> -return true;
> -}
> -return false;/* nothing to recurse to */
> +if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> +return true;
>
> I think your revised logic is wrong here because it drops the
> paramkind test, and I don't see any justification for that.
>

I have dropped the check thinking that Param extern params will be
always safe and for param exec params we now have a list of safe
params, so we don't need this check and now again thinking about it,
it seems I might not be right.  OTOH, I am not sure if the check as
written is correct for all cases, however, I think for the purpose of
this patch, we can leave it as it is and discuss separately what
should be the check (as suggested by you).  I have reverted the check
in the attached patch.

>
> But I'm also wondering if we're missing a trick here, because isn't a
> PARAM_EXTERN parameter always safe, given SerializeParamList?
>

Right.

>  If so,
> this || ought to be &&, but if that adjustment is needed, it seems
> like a separate patch.
>

How will it work if, during planning, we encounter param_extern param
in any list?  Won't it return true in that case, because param extern
params will not be present in safe_param_ids, so this check will be
true and then max_parallel_hazard_test will also return true?

How about always returning false for PARAM_EXTERN?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pq_pushdown_initplan_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-06 Thread Amit Kapila

On Fri, Oct 6, 2017 at 2:32 AM, Robert Haas  wrote:
> On Thu, Oct 5, 2017 at 1:16 PM, Amit Kapila  wrote:
>> On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas  wrote:
>>> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila  wrote:
>>>> Now, unless, I am missing something here, it won't be possible to
>>>> detect params in such cases during forming of join rels and hence we
>>>> need the tests in generate_gather_paths.  Let me know if I am missing
>>>> something in this context or if you have any better ideas to make it
>>>> work?
>>>
>>> Hmm, in a case like this, I think we shouldn't need to detect it.  The
>>> Var is perfectly parallel-safe, the problem is that we can't use a
>>> not-safe plan for the inner rel.  I wonder why that's happening
>>> here...
>>
>> It is because the inner rel (Result Path) contains a reference to a
>> param which appears to be at the same query level.  Basically due to
>> changes in max_parallel_hazard_walker().
>
> I spent several hours debugging this issue this afternoon.
>

Thanks a lot for spending time.

>  I think
> you've misdiagnosed the problem.  I think that the Param reference in
> the result path is parallel-safe; that doesn't seem to me to be wrong.
> If we see a Param reference for our own query level, then either we're
> below the Gather and the new logic added by this patch will pass down
> the value or we're above the Gather and we can access the value
> directly.  Either way, no problem.
>
> However, I think that if you attach an InitPlan to a parallel-safe
> plan, it becomes parallel-restricted.  This is obvious in the case
> where the InitPlan's plan isn't itself parallel-safe, but it's also
> arguably true even when the InitPlan's plan *is* parallel-safe,
> because pushing that below a Gather introduces a multiple-evaluation
> hazard.  I think we should fix that problem someday by providing a
> DSA-based parameter store, but that's a job for another day.
>
> And it turns out that we actually have such logic already, but this
> patch removes it:
>
> @@ -2182,7 +2181,6 @@ SS_charge_for_initplans(PlannerInfo *root,
> RelOptInfo *final_rel)
>
> path->startup_cost += initplan_cost;
> path->total_cost += initplan_cost;
> -   path->parallel_safe = false;
> }
>
> /* We needn't do set_cheapest() here, caller will do it */
>

Yeah, it is a mistake from my side.  I thought that we don't need it
now as we pass the computed value of initplan params for valid cases
and for other cases we can prohibit them as was done in the patch.

> Now, the header comment for SS_charge_for_initplans() is wrong; it
> claims we can't transmit initPlans to workers, but I think that's
> already wrong even before this patch.
>

What exactly you want to convey as part of that comment?

I have fixed the other review comment related to using safe_param_list
in the attached patch.  I think I have fixed all comments given by
you, but let me know if I have missed anything or you have any other
comment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-05 Thread Amit Kapila

On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar  wrote:
>
> Ok. How about removing pa_all_partial_subpaths altogether , and
> instead of the below condition :
>
> /*
> * If all the child rels have partial paths, and if the above Parallel
> * Append path has a mix of partial and non-partial subpaths, then consider
> * another Parallel Append path which will have *all* partial subpaths.
> * If enable_parallelappend is off, make this one non-parallel-aware.
> */
> if (partial_subpaths_valid && !pa_all_partial_subpaths)
> ..
>
> Use this condition :
> if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
> ..
>

Sounds good to me.

One minor point:

+ if (!node->as_padesc)
+ {
+ /*
+ */
+ if (!exec_append_seq_next(node))
+ return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ }

It seems either you want to add a comment in above part of patch or
you just left /**/ mistakenly.

> 
>
>
> Regarding a mix of partial and non-partial paths, I feel it always
> makes sense for the leader to choose the partial path.
>

Okay, but why not cheapest partial path?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-05 Thread Amit Kapila

On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas  wrote:
> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila  wrote:
>> Now, unless, I am missing something here, it won't be possible to
>> detect params in such cases during forming of join rels and hence we
>> need the tests in generate_gather_paths.  Let me know if I am missing
>> something in this context or if you have any better ideas to make it
>> work?
>
> Hmm, in a case like this, I think we shouldn't need to detect it.  The
> Var is perfectly parallel-safe, the problem is that we can't use a
> not-safe plan for the inner rel.  I wonder why that's happening
> here...
>

It is because the inner rel (Result Path) contains a reference to a
param which appears to be at the same query level.  Basically due to
changes in max_parallel_hazard_walker().

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-05 Thread Amit Kapila

On Thu, Oct 5, 2017 at 6:14 PM, Robert Haas  wrote:
> On Thu, Oct 5, 2017 at 6:29 AM, Amit Kapila  wrote:
>> Okay, but can't we try to pick the cheapest partial path for master
>> backend and maybe master backend can try to work on a partial path
>> which is already picked up by some worker.
>
> Well, the master backend is typically going to be the first process to
> arrive at the Parallel Append because it's already running, whereas
> the workers have to start.
>

Sure, the leader can pick the cheapest partial path to start with.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-05 Thread Amit Kapila

On Mon, Oct 2, 2017 at 6:21 PM, Robert Haas  wrote:
> On Sun, Oct 1, 2017 at 9:55 AM, Amit Kapila  wrote:
>> Isn't it for both?  I mean it is about comparing the non-partial paths
>> for child relations of the same relation and also when there are
>> different relations involved as in Union All kind of query.  In any
>> case, the point I was trying to say is that generally non-partial
>> relations will have relatively smaller scan size, so probably should
>> take lesser time to complete.
>
> I don't think that's a valid inference.  It's true that a relation
> could fail to have a partial path because it's small, but that's only
> one reason among very many.  The optimal index type could be one that
> doesn't support parallel index scans, for example.
>

Valid point.

>> Sure, I think it is quite good if we can achieve that but it seems to
>> me that we will not be able to achieve that in all scenario's with the
>> patch and rather I think in some situations it can result in leader
>> ended up picking the costly plan (in case when there are all partial
>> plans or mix of partial and non-partial plans).  Now, we are ignoring
>> such cases based on the assumption that other workers might help to
>> complete master backend.  I think it is quite possible that the worker
>> backends picks up some plans which emit rows greater than tuple queue
>> size and they instead wait on the master backend which itself is busy
>> in completing its plan.  So master backend will end up taking too much
>> time.  If we want to go with a strategy of master (leader) backend and
>> workers taking a different strategy to pick paths to work on, then it
>> might be better if we should try to ensure that master backend always
>> starts from the place which has cheapest plans irrespective of whether
>> the path is partial or non-partial.
>
> I agree that it's complicated to get this right in all cases; I'm
> realizing that's probably an unattainable ideal.
>
> However, I don't think ignoring the distinction between partial and
> non-partial plans is an improvement, because the argument that other
> workers may be able to help the leader is a correct one.  You are
> correct in saying that the workers might fill up their tuple queues
> while the leader is working, but once the leader returns one tuple it
> will switch to reading from the queues.  Every other tuple can be
> returned by some worker.  On the other hand, if the leader picks a
> non-partial plan, it must run that plan through to completion.
>
> Imagine (a) a non-partial path with a cost of 1000 returning 100
> tuples and (b) a partial path with a cost of 10,000 returning 1,000
> tuples.  No matter which path the leader picks, it has about 10 units
> of work to do to return 1 tuple.  However, if it picks the first path,
> it is committed to doing 990 more units of work later, regardless of
> whether the workers have filled the tuple queues or not.  If it picks
> the second path, it again has about 10 units of work to do to return 1
> tuple, but it hasn't committed to doing all the rest of the work of
> that path.  So that's better.
>
> Now it's hard to get all of the cases right.  If the partial path in
> the previous example had a cost of 1 crore then even returning 1 tuple
> is more costly than running the whole non-partial path.  When you
> introduce partition-wise join and parallel hash, there are even more
> problems.  Now the partial path might have a large startup cost, so
> returning the first tuple may be very expensive, and that work may
> help other workers (if this is a parallel hash) or it may not (if this
> is a non-parallel hash).
>

Yeah, these were the type of cases I am also worried.

>  I don't know that we can get all of these
> cases right, or that we should try.  However, I still think that
> picking partial paths preferentially is sensible -- we don't know all
> the details, but we do know that they're typically going to at least
> try to divide up the work in a fine-grained fashion, while a
> non-partial path, once started, the leader must run it to completion.
>

Okay, but can't we try to pick the cheapest partial path for master
backend and maybe master backend can try to work on a partial path
which is already picked up by some worker.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-05 Thread Amit Kapila

On Wed, Oct 4, 2017 at 12:55 PM, Amit Kapila  wrote:
> On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas  wrote:
>> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila  wrote:
>>
>> Having said all that, I think that this patch only wants to handle the
>> subset of cases (2) and (4) where the relevant InitPlan is attached
>> ABOVE the Gather node -- which seems very reasonable, because
>> evaluating an InitPlan at a level of the plan tree above the level at
>> which it is defined sounds like it might be complex.  But I still
>> don't quite see why we need these tests.  I mean, if we only allow
>> Param references from a set of safe parameter IDs, and the safe
>> parameter IDs include only those IDs that can be generated in a
>> worker, then won't other InitPlans in the workers anyway be ruled out?
>
> It doesn't happen always.  There are cases when the part of required
> conditions are pushed one query level below, so when we check during
> max_parallel_hazard_walker, they look safe, but actually, they are
> not.  I will try to explain by example:
>
> postgres=# explain (costs off, verbose) select * from t1 where t1.i in
> ( select 1 + (select max(j) from t3));
>   QUERY PLAN
> --
>  Hash Semi Join
>Output: t1.i, t1.j, t1.k
>Hash Cond: (t1.i = ((1 + $1)))
>->  Seq Scan on public.t1
>  Output: t1.i, t1.j, t1.k
>->  Hash
>  Output: ((1 + $1))
>  ->  Result
>Output: (1 + $1)
>InitPlan 1 (returns $1)
>  ->  Finalize Aggregate
>Output: max(t3.j)
>->  Gather
>  Output: (PARTIAL max(t3.j))
>  Workers Planned: 2
>  ->  Partial Aggregate
>Output: PARTIAL max(t3.j)
>->  Parallel Seq Scan on public.t3
>  Output: t3.j
> (19 rows)
>
> In the above example, you can see that the condition referring to
> initplan (1 + $1) is pushed one level below.  So when it tries to
> check parallel safety for the join condition, it won't see Param node.
>

To add more detail, the param node in join qual is replaced with Var
during pullup of sublink.  Basically, it constructs vars from target
list of subquery and then replaces it in join quals.  Refer code:
convert_ANY_sublink_to_join()
{
..
/*
* Build a list of Vars representing the subselect outputs.
*/
subquery_vars = generate_subquery_vars(root,
  subselect->targetList,
  rtindex);

/*
* Build the new join's qual expression, replacing Params with these Vars.
*/
quals = convert_testexpr(root, sublink->testexpr, subquery_vars);
..
}

Now, unless, I am missing something here, it won't be possible to
detect params in such cases during forming of join rels and hence we
need the tests in generate_gather_paths.  Let me know if I am missing
something in this context or if you have any better ideas to make it
work?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-04 Thread Amit Kapila

On Wed, Oct 4, 2017 at 12:55 PM, Amit Kapila  wrote:
> On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas  wrote:
>> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila  wrote:
>>
>> Having said all that, I think that this patch only wants to handle the
>> subset of cases (2) and (4) where the relevant InitPlan is attached
>> ABOVE the Gather node -- which seems very reasonable, because
>> evaluating an InitPlan at a level of the plan tree above the level at
>> which it is defined sounds like it might be complex.  But I still
>> don't quite see why we need these tests.  I mean, if we only allow
>> Param references from a set of safe parameter IDs, and the safe
>> parameter IDs include only those IDs that can be generated in a
>> worker, then won't other InitPlans in the workers anyway be ruled out?
>
> It doesn't happen always.  There are cases when the part of required
> conditions are pushed one query level below, so when we check during
> max_parallel_hazard_walker, they look safe, but actually, they are
> not.  I will try to explain by example:
>
> postgres=# explain (costs off, verbose) select * from t1 where t1.i in
> ( select 1 + (select max(j) from t3));

If you want to reproduce this case, then you can use the script posted
by Kuntal up thread [1].

[1] - 
https://www.postgresql.org/message-id/CAGz5QC%2BuHOq78GCika3fbgRyN5zgiDR9Dd1Th5kENF%2BUpnPomQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-04 Thread Amit Kapila

On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas  wrote:
> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila  wrote:
>
> Having said all that, I think that this patch only wants to handle the
> subset of cases (2) and (4) where the relevant InitPlan is attached
> ABOVE the Gather node -- which seems very reasonable, because
> evaluating an InitPlan at a level of the plan tree above the level at
> which it is defined sounds like it might be complex.  But I still
> don't quite see why we need these tests.  I mean, if we only allow
> Param references from a set of safe parameter IDs, and the safe
> parameter IDs include only those IDs that can be generated in a
> worker, then won't other InitPlans in the workers anyway be ruled out?

It doesn't happen always.  There are cases when the part of required
conditions are pushed one query level below, so when we check during
max_parallel_hazard_walker, they look safe, but actually, they are
not.  I will try to explain by example:

postgres=# explain (costs off, verbose) select * from t1 where t1.i in
( select 1 + (select max(j) from t3));
  QUERY PLAN
--
 Hash Semi Join
   Output: t1.i, t1.j, t1.k
   Hash Cond: (t1.i = ((1 + $1)))
   ->  Seq Scan on public.t1
 Output: t1.i, t1.j, t1.k
   ->  Hash
 Output: ((1 + $1))
 ->  Result
   Output: (1 + $1)
   InitPlan 1 (returns $1)
 ->  Finalize Aggregate
   Output: max(t3.j)
   ->  Gather
 Output: (PARTIAL max(t3.j))
 Workers Planned: 2
 ->  Partial Aggregate
   Output: PARTIAL max(t3.j)
   ->  Parallel Seq Scan on public.t3
 Output: t3.j
(19 rows)

In the above example, you can see that the condition referring to
initplan (1 + $1) is pushed one level below.  So when it tries to
check parallel safety for the join condition, it won't see Param node.
Now, consider if we don't check contains_parallel_unsafe_param during
generate_gather_paths, then it will lead to below plan.


postgres=# explain (costs off, verbose) select * from t1 where t1.i in
( select 1 + (select max(j) from t3));
 QUERY PLAN

 Gather
   Output: t1.i, t1.j, t1.k
   Workers Planned: 2
   ->  Hash Semi Join
 Output: t1.i, t1.j, t1.k
 Hash Cond: (t1.i = ((1 + $1)))
 ->  Parallel Seq Scan on public.t1
   Output: t1.i, t1.j, t1.k
 ->  Hash
   Output: ((1 + $1))
   ->  Result
 Output: (1 + $1)
 InitPlan 1 (returns $1)
   ->  Finalize Aggregate
 Output: max(t3.j)
 ->  Gather
   Output: (PARTIAL max(t3.j))
   Workers Planned: 2
   ->  Partial Aggregate
 Output: PARTIAL max(t3.j)
 ->  Parallel Seq Scan on public.t3
   Output: t3.j
(22 rows)

This is wrong because when we will try to evaluate params that are
required at gather node, we won't get the required param as there is
no initplan at that level.

>
> If I am all mixed up, please help straighten me out.
>

I think whatever you said is right and very clear.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-03 Thread Amit Kapila

On Mon, Oct 2, 2017 at 8:13 PM, Robert Haas  wrote:
> On Thu, Sep 14, 2017 at 10:45 PM, Amit Kapila  wrote:
>> The latest patch again needs to be rebased.  Find rebased patch
>> attached with this email.
>
> I read through this patch this morning.   Copying Tom in the hopes
> that he might chime in on the following two issues in particular:
>

It seems you forgot to include Tom, included now.

> 1. Is there any superior alternative to adding ptype to ParamExecData?
>  I think the reason why we don't have this already is because the plan
> tree that populates the Param must output the right type and the plan
> tree that reads the Param must expect the right type, and there's no
> need for anything else.  But for serialization and deserialization
> this seems to be necessary.  I wonder whether it would be better to
> try to capture this at the time the Param is generated (e.g.
> var->vartype in assign_param_for_var) rather than derived at execution
> time by applying exprType(), but I'm not sure how that would work
> exactly, or whether it's really any better.
>
> 2. Do max_parallel_hazard_walker and set_param_references() have the
> right idea about which parameters are acceptable candidates for this
> optimization?  The idea seems to be to collect the setParam sets for
> all initplans between the current query level and the root.  That
> looks correct to me but I'm not an expert on this parameter stuff.
>
> Some other comments:
>
> - I think parallel_hazard_walker should likely work by setting
> safe_param_ids to the right set of parameter IDs rather than testing
> whether the parameter is safe by checking either whether it is in
> safe_param_ids or some other condition is met.
>

Okay, I think it should work the way you are suggesting.  I will give it a try.

> - contains_parallel_unsafe_param() sounds like a function that merely
> returns true or false, but it actually has major side effects.  Those
> side effects also look unsafe; mutating previously-generated paths can
> corrupt the rel's pathlist, because it will no longer have the sort
> order and other characteristics that add_path() creates and upon which
> other code relies.
>
> - Can't is_initplan_is_below_current_query_level() be confused when
> there are multiple subqueries in the tree?  For example if the
> toplevel query has subqueries a and b and a has a sub-subquery aa
> which has an initplan, won't this function think that b has an
> initplan below the current query level?  If not, maybe a comment is in
> order explaining why not?
>

We can discuss above two points once you are convinced about whether
we need any such functions, so let's first discuss that.

> - Why do we even need contains_parallel_unsafe_param() and
> is_initplan_is_below_current_query_level() in the first place, anyway?
>  I think there's been some discussion of that on this thread, but I'm
> not sure I completely understand it, and the comments in the patch
> don't help me understand why we now need this restriction.
>

This is to ensure that initplans are never below Gather node.  If we
allow parallelism when initplan is below current query level, (a) then
we need a way to pass the result of initplan from worker to other
workers and to master backend (b) if initplan contains reference to
some parameter above the current query level (undirect correlated
plans), then we need a mechanism to pass such parameters (basically
allow correlated initplans work).

Now, one might think (a) and or (b) is desirable so that it can be
used in more number of cases, but based on TPC-H analysis we have
found that it is quite useful even without those and in fact after we
support those cases the benefits might not be significant.

The patch contains few comments in generate_gather_paths and at the
top of functions contains_parallel_unsafe_param and
is_initplan_is_below_current_query_level, if you feel it should be
explained in further detail, then let me know.

> - The new code in ExplainPrintPlan() needs a comment.
>

There was a comment, but I have added a somewhat detailed comment now,
check if that makes it clear.

> - I have typically referred in comments to "Gather or Gather Merge"
> rather than "gather or gather merge".
>

Changed as per suggestion.

Changed funcation name is_initplan_is_below_current_query_level to
is_initplan_below_current_query_level as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pq_pushdown_initplan_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Amit Kapila

On Tue, Oct 3, 2017 at 8:31 AM, Amit Kapila  wrote:
> On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich  
> wrote:
>> Hi,
>>
>> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
>> a couple of parallel worker core dumps with the backtrace below.
>> Although most of the backtrace is inside the dynamic linker, it looks
>> like it was passed a pointer to gone-away shared memory.
>>
>
> It appears to be some dangling pointer, but not sure how it is
> possible.  Can you provide some more details, like do you have any
> other library which you want to get loaded in the backend (like by
> using shared_preload_libraries or by some other way)?  I think without
> that we shouldn't try to load anything in the parallel worker.
>

Another possibility could be that the memory for library space has
been overwritten either in master backend or in worker backend.  I
think that is possible in low-memory conditions if in someplace we try
to write in the memory without ensuring if space is allocated.  I have
browsed the nearby code and didn't find any such instance.  One idea
to narrow down the problem is to see if the other members in worker
backend are sane, for ex. can you try printing the value of
MyFixedParallelState as we get that value from shared memory similar
to libraryspace.  It seems from call stack that the memory of
libraryspace is corrupted, so we can move the call to
lookup/RestoreLibraryState immediately after we assign
MyFixedParallelState.  I think if after this also the memory for
libraryspace is corrupted, then probably something bad has happened in
master backend.

Any other ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Amit Kapila

On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich  wrote:
> Hi,
>
> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
> a couple of parallel worker core dumps with the backtrace below.
> Although most of the backtrace is inside the dynamic linker, it looks
> like it was passed a pointer to gone-away shared memory.
>

It appears to be some dangling pointer, but not sure how it is
possible.  Can you provide some more details, like do you have any
other library which you want to get loaded in the backend (like by
using shared_preload_libraries or by some other way)?  I think without
that we shouldn't try to load anything in the parallel worker.  Also,
if you can get the failed query (check in server log), it would be
great.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-01 Thread Amit Kapila

On Sat, Sep 30, 2017 at 9:25 PM, Robert Haas  wrote:
> On Sat, Sep 30, 2017 at 12:20 AM, Amit Kapila  wrote:
>> Okay, but the point is whether it will make any difference
>> practically.  Let us try to see with an example, consider there are
>> two children (just taking two for simplicity, we can extend it to
>> many) and first having 1000 pages to scan and second having 900 pages
>> to scan, then it might not make much difference which child plan
>> leader chooses.  Now, it might matter if the first child relation has
>> 1000 pages to scan and second has just 1 page to scan, but not sure
>> how much difference will it be in practice considering that is almost
>> the maximum possible theoretical difference between two non-partial
>> paths (if we have pages greater than 1024 pages
>> (min_parallel_table_scan_size) then it will have a partial path).
>
> But that's comparing two non-partial paths for the same relation --
> the point here is to compare across relations.

Isn't it for both?  I mean it is about comparing the non-partial paths
for child relations of the same relation and also when there are
different relations involved as in Union All kind of query.  In any
case, the point I was trying to say is that generally non-partial
relations will have relatively smaller scan size, so probably should
take lesser time to complete.

>  Also keep in mind
> scenarios like this:
>
> SELECT ... FROM relation UNION ALL SELECT ... FROM generate_series(...);
>

I think for the FunctionScan case, non-partial paths can be quite costly.

>>> It's a lot fuzzier what is best when there are only partial plans.
>>>
>>
>> The point that bothers me a bit is whether it is a clear win if we
>> allow the leader to choose a different strategy to pick the paths or
>> is this just our theoretical assumption.  Basically, I think the patch
>> will become simpler if pick some simple strategy to choose paths.
>
> Well, that's true, but is it really that much complexity?
>
> And I actually don't see how this is very debatable.  If the only
> paths that are reasonably cheap are GIN index scans, then the only
> strategy is to dole them out across the processes you've got.  Giving
> the leader the cheapest one seems to be to be clearly smarter than any
> other strategy.
>

Sure, I think it is quite good if we can achieve that but it seems to
me that we will not be able to achieve that in all scenario's with the
patch and rather I think in some situations it can result in leader
ended up picking the costly plan (in case when there are all partial
plans or mix of partial and non-partial plans).  Now, we are ignoring
such cases based on the assumption that other workers might help to
complete master backend.  I think it is quite possible that the worker
backends picks up some plans which emit rows greater than tuple queue
size and they instead wait on the master backend which itself is busy
in completing its plan.  So master backend will end up taking too much
time.  If we want to go with a strategy of master (leader) backend and
workers taking a different strategy to pick paths to work on, then it
might be better if we should try to ensure that master backend always
starts from the place which has cheapest plans irrespective of whether
the path is partial or non-partial.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-30 Thread Amit Kapila

On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar  wrote:
> On 16 September 2017 at 10:42, Amit Kapila  wrote:
>>
>> At a broader level, the idea is good, but I think it won't turn out
>> exactly like that considering your below paragraph which indicates
>> that it is okay if leader picks a partial path that is costly among
>> other partial paths as a leader won't be locked with that.
>> Considering this is a good design for parallel append, the question is
>> do we really need worker and leader to follow separate strategy for
>> choosing next path.  I think the patch will be simpler if we can come
>> up with a way for the worker and leader to use the same strategy to
>> pick next path to process.  How about we arrange the list of paths
>> such that first, all partial paths will be there and then non-partial
>> paths and probably both in decreasing order of cost.  Now, both leader
>> and worker can start from the beginning of the list. In most cases,
>> the leader will start at the first partial path and will only ever
>> need to scan non-partial path if there is no other partial path left.
>> This is not bulletproof as it is possible that some worker starts
>> before leader in which case leader might scan non-partial path before
>> all partial paths are finished, but I think we can avoid that as well
>> if we are too worried about such cases.
>
> If there are no partial subpaths, then again the leader is likely to
> take up the expensive subpath. And this scenario would not be
> uncommon.
>

While thinking about how common the case of no partial subpaths would
be, it occurred to me that as of now we always create a partial path
for the inheritance child if it is parallel-safe and the user has not
explicitly set the value of parallel_workers to zero (refer
compute_parallel_worker).  So, unless you are planning to change that,
I think it will be quite uncommon to have no partial subpaths.

Few nitpicks in your latest patch:
1.
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
  if (subnode->chgParam == NULL)
  ExecReScan(subnode);
  }
+

Looks like a spurious line.

2.
@@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
..
+ if (chosen_path && chosen_path != cheapest_partial_path)
+ pa_all_partial_subpaths = false;

It will keep on setting pa_all_partial_subpaths as false for
non-partial paths which don't seem to be the purpose of this variable.
I think you want it to be set even when there is one non-partial path,
so isn't it better to write as below or something similar:
if (pa_nonpartial_subpaths && pa_all_partial_subpaths)
pa_all_partial_subpaths = false;


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-29 Thread Amit Kapila

On Sat, Sep 30, 2017 at 4:02 AM, Robert Haas  wrote:
> On Fri, Sep 29, 2017 at 7:48 AM, Amit Kapila  wrote:
>> I think in general the non-partial paths should be cheaper as compared
>> to partial paths as that is the reason planner choose not to make a
>> partial plan at first place. I think the idea patch is using will help
>> because the leader will choose to execute partial path in most cases
>> (when there is a mix of partial and non-partial paths) and for that
>> case, the leader is not bound to complete the execution of that path.
>> However, if all the paths are non-partial, then I am not sure much
>> difference it would be to choose one path over other.
>
> The case where all plans are non-partial is the case where it matters
> most!  If the leader is going to take a share of the work, we want it
> to take the smallest share possible.
>

Okay, but the point is whether it will make any difference
practically.  Let us try to see with an example, consider there are
two children (just taking two for simplicity, we can extend it to
many) and first having 1000 pages to scan and second having 900 pages
to scan, then it might not make much difference which child plan
leader chooses.  Now, it might matter if the first child relation has
1000 pages to scan and second has just 1 page to scan, but not sure
how much difference will it be in practice considering that is almost
the maximum possible theoretical difference between two non-partial
paths (if we have pages greater than 1024 pages
(min_parallel_table_scan_size) then it will have a partial path).

> It's a lot fuzzier what is best when there are only partial plans.
>

The point that bothers me a bit is whether it is a clear win if we
allow the leader to choose a different strategy to pick the paths or
is this just our theoretical assumption.  Basically, I think the patch
will become simpler if pick some simple strategy to choose paths.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-29 Thread Amit Kapila

On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar  wrote:
> On 16 September 2017 at 10:42, Amit Kapila  wrote:
>> On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas  wrote:
>>> On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila  
>>> wrote:
>>>> I think the patch stores only non-partial paths in decreasing order,
>>>> what if partial paths having more costs follows those paths?
>>>
>>> The general picture here is that we don't want the leader to get stuck
>>> inside some long-running operation because then it won't be available
>>> to read tuples from the workers.  On the other hand, we don't want to
>>> just have the leader do no work because that might be slow.  And in
>>> most cast cases, the leader will be the first participant to arrive at
>>> the Append node, because of the worker startup time.  So the idea is
>>> that the workers should pick expensive things first, and the leader
>>> should pick cheap things first.
>>>
>>
>> At a broader level, the idea is good, but I think it won't turn out
>> exactly like that considering your below paragraph which indicates
>> that it is okay if leader picks a partial path that is costly among
>> other partial paths as a leader won't be locked with that.
>> Considering this is a good design for parallel append, the question is
>> do we really need worker and leader to follow separate strategy for
>> choosing next path.  I think the patch will be simpler if we can come
>> up with a way for the worker and leader to use the same strategy to
>> pick next path to process.  How about we arrange the list of paths
>> such that first, all partial paths will be there and then non-partial
>> paths and probably both in decreasing order of cost.  Now, both leader
>> and worker can start from the beginning of the list. In most cases,
>> the leader will start at the first partial path and will only ever
>> need to scan non-partial path if there is no other partial path left.
>> This is not bulletproof as it is possible that some worker starts
>> before leader in which case leader might scan non-partial path before
>> all partial paths are finished, but I think we can avoid that as well
>> if we are too worried about such cases.
>
> If there are no partial subpaths, then again the leader is likely to
> take up the expensive subpath.
>

I think in general the non-partial paths should be cheaper as compared
to partial paths as that is the reason planner choose not to make a
partial plan at first place. I think the idea patch is using will help
because the leader will choose to execute partial path in most cases
(when there is a mix of partial and non-partial paths) and for that
case, the leader is not bound to complete the execution of that path.
However, if all the paths are non-partial, then I am not sure much
difference it would be to choose one path over other.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-25 Thread Amit Kapila

On Mon, Sep 25, 2017 at 12:18 PM, Amit Langote
 wrote:
> Hi.
>
> Trying to catch up.
>
> On 2017/09/25 13:43, Michael Paquier wrote:
>> On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila  wrote:
>>> Added and updated the comments for both btree and hash index patches.
>>
>> I don't have real complaints about this patch, this looks fine to me.
>>
>> +* Currently, the advantage of setting pd_lower is in limited cases like
>> +* during wal_consistency_checking or while logging for unlogged relation
>> +* as for all other purposes, we initialize the metapage.  Note, it also
>> +* helps in page masking by allowing to mask unused space.
>> I would have reworked this comment a bit, say like that:
>> Setting pd_lower is useful for two cases which make use of WAL
>> compressibility even if the meta page is initialized at replay:
>> - Logging of init forks for unlogged relations.
>> - wal_consistency_checking logs extra full-page writes, and this
>> allows masking of the unused space of the page.
>>
>> Now I often get complains that I suck at this exercise ;)
>
> So, I think I understand the discussion so far and the arguments about
> what we should write to explain why we're setting pd_lower to the correct
> value.
>
> Just to remind, I didn't actually start this thread [1] to address the
> issue that the FPWs of meta pages written to WAL are not getting
> compressed.  An external backup tool relies on pd_lower to give the
> correct starting offset of the hole to compress, provided the page's other
> fields suggest it has the standard page layout.  Since we discovered that
> pd_lower is not set correctly in gin, brin, and spgist meta pages, I
> created patches to do the same.  You (Michael) pointed out that that
> actually helps compress their FPW images in WAL as well, so we began
> considering that.  Also, you pointed out that WAL checking code masks
> pages based on the respective masking functions' assumptions about the
> page's layout properties, which the original patches forgot to consider.
> So, we updated the patches to change the respective masking functions to
> mask meta pages as pages with standard page layout, too.
>
> But as Tom pointed out [2], WAL compressibility benefit cannot be obtained
> unless we change how the meta page is passed to xlog.c to be written to
> WAL.  So, we found out all the places that write/register the meta page to
> WAL and changed the respective XLogRegisterBuffer calls to include the
> REGBUG_STANDARD flag.  Some of these calls already passed
> REGBUF_WILL_INIT, which would result in no FPW image to be written to the
> WAL and so there would be no question of compressibility.  But, passing
> REGBUF_STANDARD would still help the case where WAL checking is enabled,
> because FPW image *is* written in that case.
>
> So, ISTM, comments that the patches add should all say that setting the
> meta pages' pd_lower to the correct value helps to pass those pages to
> xlog.c as compressible standard layout pages, regardless of whether they
> are actually passed that way.  Even if the patches do take care of the
> latter as well.
>
> Did I miss something?
>
> Looking at Amit K's updated patches for btree and hash, it seems that he
> updated the comment to say that setting pd_lower to the correct value is
> *essential*, because those pages are passed as REGBUF_STANDARD pages and
> hence will be compressed.  That seems to contradict what I wrote above,
> but I think that's not wrong, too.
>

I think you are missing that there are many cases where we use only
REGBUF_STANDARD for meta-pages (cf. hash index).  For btree, where the
advantage is in fewer cases, I have explicitly stated those cases.

Do you still have confusion?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-24 Thread Amit Kapila

On Mon, Sep 25, 2017 at 10:13 AM, Michael Paquier
 wrote:
> On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila  wrote:
>> Added and updated the comments for both btree and hash index patches.
>
> I don't have real complaints about this patch, this looks fine to me.
>
> +* Currently, the advantage of setting pd_lower is in limited cases like
> +* during wal_consistency_checking or while logging for unlogged relation
> +* as for all other purposes, we initialize the metapage.  Note, it also
> +* helps in page masking by allowing to mask unused space.
> I would have reworked this comment a bit, say like that:
> Setting pd_lower is useful for two cases which make use of WAL
> compressibility even if the meta page is initialized at replay:
> - Logging of init forks for unlogged relations.
> - wal_consistency_checking logs extra full-page writes, and this
> allows masking of the unused space of the page.
>
> Now I often get complains that I suck at this exercise ;)
>

I understand that and I think there are always multiple ways to write
same information.  It might be better to pass this patch series to
committer if you don't see any mistake because he might anyway change
some comments before committing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-24 Thread Amit Kapila

On Fri, Sep 22, 2017 at 11:56 PM, Robert Haas  wrote:
> On Thu, Sep 21, 2017 at 3:08 AM, Ashutosh Sharma  
> wrote:
>> I have added a note for handling of logged and unlogged tables in
>> README file and also corrected the header comment for
>> hashbucketcleanup(). Please find the attached 0003*.patch having these
>> changes. Thanks.
>
> I committed 0001 and 0002 with some additional edits as
> 7c75ef571579a3ad7a1d3ee909f11dba5e0b9440.
>

I have noticed a typo in that commit (in Readme) and patch for the
same is attached.

>  I also rebased 0003 and
> edited it a bit; see attached hash-cleanup-changes.patch.
>
> I'm not entirely sold on 0003.  An alternative would be to rip the lsn
> stuff back out of HashScanPosData, and I think we ought to consider
> that.  Basically, 0003 is betting that getting rid of the
> lock-chaining in hash index vacuum is more valuable than being able to
> kill dead items more aggressively.  I bet that's a bad bet.
>
> In the case of btree indexes, since
> 2ed5b87f96d473962ec5230fd820abfeaccb2069, page-at-a-time scanning
> allows most btree index scans to avoid holding buffer pins when the
> scan is suspended, but we gain no such advantage here.  We always have
> to hold a pin on the primary bucket page anyway, so even with this
> patch cleanup is going to block when it hits a bucket containing a
> suspended scan.  0003 helps if (a) the relation is permanent, (b) the
> bucket has overflow pages, and (c) the scan is moving faster than
> vacuum and can overtake it instead of waiting.  But that doesn't seem
> like it will happen very often at all, whereas the LSN check will
> probably fail frequently and cause us to skip cleanup that we could
> usefully have done.  So I propose the attached hashscan-no-lsn.patch
> as an alternative.
>

I think your proposal makes sense.  Your patch looks good but you
might want to tweak the comments atop _hash_kill_items ("However,
having pin on the overflow page doesn't guarantee that vacuum won't
delete any items.).  That part of the comment has been written to
indicate that we have to check LSN in this function unconditonally.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


typo_hash_readme_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-23 Thread Amit Kapila

On Wed, Sep 20, 2017 at 9:19 AM, Amit Kapila  wrote:
> On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier
>  wrote:
>> On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane  wrote:
>>> Amit Kapila  writes:
>>>> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
>>>>  wrote:
>>>>> I am not saying that no index AMs take advantage FPW compressibility
>>>>> for their meta pages. There are cases like this one, as well as one
>>>>> code path in BRIN where this is useful, and it is useful as well when
>>>>> logging FPWs of the init forks for unlogged relations.
>>>
>>>> Hmm, why is it useful for logging FPWs of the init forks for unlogged
>>>> relations?  We don't use REGBUF_STANDARD in those cases.
>>>
>>> But if we started to do so, that would be a concrete benefit of this
>>> patch ...
>>
>> In the proposed set of patches, all the empty() routines part of index
>> AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the
>> right thing by updating log_newpage_buffer(). btree also should have
>> its call to log_newpage updated in btbuildempty(), and your patch is
>> missing that.
>>
>
> We can add that for btree patch.
>

Added and updated the comments for both btree and hash index patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


change_metapage_usage_btree-v2.patch
Description: Binary data


change_metapage_usage_hash-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-09-21 Thread Amit Kapila

On Thu, Sep 21, 2017 at 4:14 PM, Kyotaro HORIGUCHI
 wrote:
>
> postgres=# create table test (id int primary key, v text);
> postgres=# create index on test using hash (id);
> WARNING:  hash indexes are not WAL-logged and their use is discouraged
>
> But not for for unlogged tables.
>
> postgres=# create unlogged table test (id int primary key, v text);
> postgres=# create index on test using hash (id);
> postgres=# (no warning)
>
> And fails on promotion in the same way.
>
> postgres=# select * from test;
> ERROR:  could not open file "base/13324/16446": No such file or directory
>
> indexcmds.c@965:503
>>   if (strcmp(accessMethodName, "hash") == 0 &&
>> RelationNeedsWAL(rel))
>> ereport(WARNING,
>> (errmsg("hash indexes are not WAL-logged and their use is 
>> discouraged")));
>
> Using !RelationUsesLocalBuffers instead fixes that and the
> attached patch is for 9.6. I'm a bit unconfident on the usage of
> logical meaning of the macro but what it does fits there.
>


I think giving an error message like "hash indexes are not WAL-logged
and .." for unlogged tables doesn't seem like a good behavior.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-20 Thread Amit Kapila

On Wed, Sep 20, 2017 at 6:44 PM, Robert Haas  wrote:
> On Wed, Sep 20, 2017 at 7:45 AM, Amit Kapila  wrote:
>> Right, I was thinking from the perspective of the index entry.  Before
>> marking index entry as dead, we do check for heaptid.  So, as heaptid
>> can't be reused via Page-at-a-time index vacuum, scan won't mark index
>> entry as dead.
>
> It can mark index entries dead, but if it does, they correspond to
> heap TIDs that are still dead, as opposed to heap TIDs that have been
> resurrected by being reused for an unrelated tuple.
>
> In other words, the danger scenario is this:
>
> 1. A page-at-a-time scan records all the TIDs on a page.
> 2. VACUUM processes the page, removing some of those TIDs.
> 3. VACUUM finishes, changing the heap TIDs from dead to unused.
> 4. Somebody inserts a new tuple at one of the existing TIDs, and the
> index tuple gets put on the page scanned in step 1.
> 5. The page-at-a-time scan resumes and kills the tuple added in step 4
> by mistake, when it really only intended to kill a tuple removed in
> step 2.
>
> What prevent this is:
>
> A. To begin scanning a bucket, VACUUM needs a cleanup lock on the
> primary bucket page.  Therefore, there are no scans in progress at the
> time that VACUUM begins scanning the bucket.
>
> B. If a scan begins scanning the bucket, it can't pass VACUUM, because
> VACUUM doesn't release the page lock on one page before taking the one
> for the next page.
>
> C. After 0003, it becomes possible for a scan to pass VACUUM if the
> table is permanent, but it won't be a problem because of the LSN
> check.
>

That's right.  So, in short, this patch handles the problemetic scenario.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-20 Thread Amit Kapila

On Wed, Sep 20, 2017 at 4:56 PM, Robert Haas  wrote:
> On Wed, Sep 20, 2017 at 7:19 AM, Amit Kapila  wrote:
>>> Page-at-a-time index vacuum as in _hash_vacuum_one_page doesn't matter
>>> because such an operation doesn't allow TIDs to be reused.
>>
>> Page-at-a-time index vacuum also allows TIDs to be reused but this is
>> done only for already marked dead items whereas vacuum can make the
>> non-dead entries to be removed.  We don't have a problem with
>> page-at-a-time vacuum as it won't remove any items which the scan is
>> going to mark as dead.
>
> I don't think page-at-a-time index vacuum allows heap TIDs to be
> reused.
>

Right, I was thinking from the perspective of the index entry.  Before
marking index entry as dead, we do check for heaptid.  So, as heaptid
can't be reused via Page-at-a-time index vacuum, scan won't mark index
entry as dead.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-20 Thread Amit Kapila

On Wed, Sep 20, 2017 at 4:04 PM, Robert Haas  wrote:
> On Tue, Sep 19, 2017 at 11:34 PM, Amit Kapila  wrote:
>> This point has been discussed above [1] and to avoid this problem we
>> are keeping the scan always behind vacuum for unlogged and temporary
>> tables as we are doing without this patch.   That will ensure vacuum
>> won't be able to remove the TIDs which we are going to mark as dead.
>> This has been taken care in patch 0003.  I understand that this is
>> slightly ugly, but the other alternative (as mentioned in the email
>> [1]) is much worse.
>
> Hmm.  So if I understand correctly, you're saying that the LSN check
> in patch 0001 is actually completely unnecessary if we only apply
> 0001, but is needed in preparation for 0003, after which it will
> really be doing something?
>

Right.

> In more detail, I suppose the idea is: a TID cannot be reused until a
> VACUUM has intervened; VACUUM always visits every data page in the
> index; we won't allow a scan to pass VACUUM (and thus possibly have
> one of its TIDs get reused) except when the LSN check is actually
> sufficient to guarantee no TID reuse (i.e. table is not unlogged).

Right.

> Page-at-a-time index vacuum as in _hash_vacuum_one_page doesn't matter
> because such an operation doesn't allow TIDs to be reused.
>

Page-at-a-time index vacuum also allows TIDs to be reused but this is
done only for already marked dead items whereas vacuum can make the
non-dead entries to be removed.  We don't have a problem with
page-at-a-time vacuum as it won't remove any items which the scan is
going to mark as dead.

> Did I get it right?
>

I think so.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-19 Thread Amit Kapila

On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes  wrote:
> On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro
>  wrote:
>>
>> On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila 
>> wrote:
>> > The attached patch fixes both the review comments as discussed above.
>
>
> that should be fixed by turning costs on the explain, as is the tradition.
>

Right.  BTW, did you get a chance to run the original test (for which
you have reported the problem) with this patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Amit Kapila

On Wed, Sep 20, 2017 at 9:23 AM, Tom Lane  wrote:
> Craig Ringer  writes:
>> On 19 September 2017 at 18:04, Petr Jelinek 
>> wrote:
>>> If you are asking why they are not identified by the
>>> BackgroundWorkerHandle, then it's because it's private struct and can't
>>> be shared with other processes so there is no way to link the logical
>>> worker info with bgworker directly.
>
>> I really want BackgroundWorkerHandle to be public, strong +1 from me.
>
> I'm confused about what you think that would accomplish.  AFAICS, the
> point of BackgroundWorkerHandle is to allow the creator/requestor of
> a bgworker to verify whether or not the slot in question is still
> "owned" by its request.
>

Right, but can we avoid maintaining additional information (in_use,
generation,..) in LogicalRepWorker which is similar to bgworker worker
machinery (which in turn can also avoid all the housekeeping for those
variables) if we have access to BackgroundWorkerHandle?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-19 Thread Amit Kapila

On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier
 wrote:
> On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
>>>  wrote:
>>>> I am not saying that no index AMs take advantage FPW compressibility
>>>> for their meta pages. There are cases like this one, as well as one
>>>> code path in BRIN where this is useful, and it is useful as well when
>>>> logging FPWs of the init forks for unlogged relations.
>>
>>> Hmm, why is it useful for logging FPWs of the init forks for unlogged
>>> relations?  We don't use REGBUF_STANDARD in those cases.
>>
>> But if we started to do so, that would be a concrete benefit of this
>> patch ...
>
> In the proposed set of patches, all the empty() routines part of index
> AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the
> right thing by updating log_newpage_buffer(). btree also should have
> its call to log_newpage updated in btbuildempty(), and your patch is
> missing that.
>

We can add that for btree patch.

> Also, _hash_init() would need some extra work to
> generate FPWs, but I don't think that it is necessary per its handling
> of a per-record meta data either. So REGBUF_STANDARD could be just
> removed from there, and there is actually no need to patch
> src/backend/access/hash at all.
>

I think there is no need to remove it.  As per discussion above, we
want to keep REGBUF_STANDARD for all metapage initializations for the
matter of consistency and that will be useful for
wal_consistency_checking in which case we anyway need full page image.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 9:49 PM, Robert Haas  wrote:
> On Thu, Aug 24, 2017 at 11:26 AM, Jesper Pedersen
>  wrote:
>> Based on the feedback in this thread, I have moved the patch to "Ready for
>> Committer".
>
> Reviewing 0001:
>
> _hash_readpage gets the page LSN to see if we can apply LP_DEAD hints,
> but if the table is unlogged or temporary, the LSN will never change,
> so the test in _hash_kill_items will always think that it's OK to
> apply the hints.  (This seems like it might be a pretty serious
> problem, because I'm not sure what would be a viable workaround.)
>

This point has been discussed above [1] and to avoid this problem we
are keeping the scan always behind vacuum for unlogged and temporary
tables as we are doing without this patch.   That will ensure vacuum
won't be able to remove the TIDs which we are going to mark as dead.
This has been taken care in patch 0003.  I understand that this is
slightly ugly, but the other alternative (as mentioned in the email
[1]) is much worse.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1J6xiJUOidBaOt0iPsAdS0%2Bp5PoKFf1R2yVjTwrY_4snA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 6:51 PM, Petr Jelinek
 wrote:
> On 19/09/17 15:08, Amit Kapila wrote:
>>
>> I am not much aware of this area.  Can you explain what other usages
>> it has apart from in the process that has launched the worker and in
>> worker itself?
>>
>
> The subscription "DDL" commands and monitoring functions need access to
> that info.
>

Got your point, but still not sure if we need to maintain additional
information to replicate something similar to bgworker machinery.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek
 wrote:
> n 18/09/17 18:42, Tom Lane wrote:
>> Amit Kapila  writes:
>>> On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane  wrote:
>>>> The subscriber log includes
>>>> 2017-09-18 08:43:08.240 UTC [15672] WARNING:  out of background worker 
>>>> slots
>>>> Maybe that's harmless, but I'm suspicious that it's a smoking gun.
>>
>>> I have noticed while fixing the issue you are talking that this path
>>> is also susceptible to such problems.  In
>>> WaitForReplicationWorkerAttach(), it relies on
>>> GetBackgroundWorkerPid() to know the status of the worker which won't
>>> give the correct status in case of fork failure.  The patch I have
>>> posted has a fix for the issue,
>>
>> Your GetBackgroundWorkerPid fix looks good as far as it goes, but
>> I feel that WaitForReplicationWorkerAttach's problems go way deeper
>> than that --- in fact, that function should not exist at all IMO.
>>
>> The way that launcher.c is set up, it's relying on either the calling
>> process or the called process to free the logicalrep slot when done.
>> The called process might never exist at all, so the second half of
>> that is obviously unreliable; but WaitForReplicationWorkerAttach
>> can hardly be relied on either, seeing it's got a big fat exit-on-
>> SIGINT in it.  I thought about putting a PG_TRY, or more likely
>> PG_ENSURE_ERROR_CLEANUP, into it, but that doesn't fix the basic
>> problem: you can't assume that the worker isn't ever going to start,
>> just because you got a signal that means you shouldn't wait anymore.
>>
>> Now, this rickety business might be fine if it were only about the
>> states of the caller and called processes, but we've got long-lived
>> shared data involved (ie the worker slots); failing to clean it up
>> is not an acceptable outcome.
>>
>
> We'll clean it up eventually if somebody requests creation of new
> logical replication worker and that slot is needed. See the "garbage
> collection" part in logicalrep_worker_launch(). I know it's not ideal
> solution, but it's the best one I could think of given the bellow.
>
>> So, frankly, I think we would be best off losing the "logical rep
>> worker slot" business altogether, and making do with just bgworker
>> slots.  The alternative is getting the postmaster involved in cleaning
>> up lrep slots as well as bgworker slots, and I'm going to resist
>> any such idea strongly.  That way madness lies, or at least an
>> unreliable postmaster --- if we need lrep slots, what other forty-seven
>> data structures are we going to need the postmaster to clean up
>> in the future?
>>
>> I haven't studied the code enough to know why it grew lrep worker
>> slots in the first place.  Maybe someone could explain?
>>
>
> I am not quite sure I understand this question, we need to store
> additional info about workers in shared memory so we need slots for that.
>
> If you are asking why they are not identified by the
> BackgroundWorkerHandle, then it's because it's private struct and can't
> be shared with other processes so there is no way to link the logical
> worker info with bgworker directly.
>

Not sure, but can't we identify the actual worker slot if we just have
pid of background worker?  IIUC, you need access to
BackgroundWorkerHandle by other processes, so that you can stop them
like in case of drop subscription command.  If so, then, might be
storing pid can serve the purpose.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 9:33 AM, Michael Paquier
 wrote:
> On Tue, Sep 19, 2017 at 12:57 PM, Michael Paquier
>  wrote:
>> I'd think about adjusting the comments the proper way for each AM so
>> as one can read those comments and catch any limitation immediately.
>> The fact this has not been pointed out on this thread before any
>> checks and the many mails exchanged on the matter on this thread make
>> me think that the current code does not outline the current code
>> properties appropriately.
>
> Another thing that we could consider as well is adding an assertion in
> XLogRegisterBuffer & friends so as the combination of REGBUF_STANDARD
> and REGBUF_NO_IMAGE is forbidden. That's bugging me as well.
>

Is that related to this patch?  If not, then maybe we can discuss it
in a separate thread.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
 wrote:
> On Tue, Sep 19, 2017 at 12:40 PM, Amit Kapila  wrote:
>> On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier
>>  wrote:
>>> On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila  
>>> wrote:
>>>> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
>>>>  wrote:
>>>>>
>>>>
>>>> You have already noticed above that it will help when
>>>> wal_checking_consistency is used and that was the main motivation to
>>>> pass REGBUF_STANDARD apart from maintaining consistency.  It is not
>>>> clear to me what is bothering you.  If your only worry about these
>>>> patches is that you want this sentence to be removed from the comment
>>>> because you think it is obvious or doesn't make much sense, then I
>>>> think we can leave this decision to committer.  I have added it based
>>>> on Tom's suggestion above thread about explaining why it is
>>>> inessential or essential to set pd_lower.  I think Amit Langote just
>>>> tried to mimic what I have done in hash and btree patches to maintain
>>>> consistency.  I am also not very sure if we should write some detailed
>>>> comment or leave the existing comment as it is.  I think it is just a
>>>> matter of different perspective.
>>>
>>> What is disturbing me a bit is that the existing comments mention
>>> something that could be supported (the compression of pages), but
>>> that's actually not done and this is unlikely to happen because the
>>> number of bytes associated to a meta page is going to be always
>>> cheaper than a FPW, which would cost in CPU to store it for
>>> compression is enabled. So I think that we should switch comments to
>>> mention that pd_lower is set so as this helps with page masking, but
>>> we don't take advantage of XLOG compression in the code.
>>>
>>
>> I think that is not true because we do need FPW for certain usages of
>> metapage.  Consider a case in _hash_doinsert where register metabuf
>> with just
>> REGBUF_STANDARD, it can take advantage of removing the hole if
>> pd_lower is set to its correct position.
>
> I am not saying that no index AMs take advantage FPW compressibility
> for their meta pages. There are cases like this one, as well as one
> code path in BRIN where this is useful, and it is useful as well when
> logging FPWs of the init forks for unlogged relations.
>

Hmm, why is it useful for logging FPWs of the init forks for unlogged
relations?  We don't use REGBUF_STANDARD in those cases.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 6:29 PM, Petr Jelinek
 wrote:
> On 19/09/17 14:33, Amit Kapila wrote:
>> On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek
>>  wrote:
>>> n 18/09/17 18:42, Tom Lane wrote:
>>>
>>>> So, frankly, I think we would be best off losing the "logical rep
>>>> worker slot" business altogether, and making do with just bgworker
>>>> slots.
>>
>> I think that would be cleaner as compared to what we have now.
>>
>>>>  The alternative is getting the postmaster involved in cleaning
>>>> up lrep slots as well as bgworker slots, and I'm going to resist
>>>> any such idea strongly.  That way madness lies, or at least an
>>>> unreliable postmaster --- if we need lrep slots, what other forty-seven
>>>> data structures are we going to need the postmaster to clean up
>>>> in the future?
>>>>
>>>> I haven't studied the code enough to know why it grew lrep worker
>>>> slots in the first place.  Maybe someone could explain?
>>>>
>>>
>>> I am not quite sure I understand this question, we need to store
>>> additional info about workers in shared memory so we need slots for that.
>>>
>>
>> Yeah, but you could have used the way we do for parallel query where
>> we setup dsm and share all such information.  You can check the logic
>> of execparallel.c and parallel.c to see how we do all such stuff for
>> parallel query.
>>
>
> I don't understand how DSM helps in this case (except perhaps the memory
> allocation being dynamic rather than static). We still need this shared
> memory area to be accessible from other places than launcher (the
> paralllel query has one lead which manages everything, that's not true
> for logical replication)
>

I am not much aware of this area.  Can you explain what other usages
it has apart from in the process that has launched the worker and in
worker itself?

> and we need it to survive restart of launcher
> for example, so all the current issues will stay.
>

Do you mean to say that you want to save this part of shared memory
across restart of launcher?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Amit Kapila

On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek
 wrote:
> n 18/09/17 18:42, Tom Lane wrote:
>
>> So, frankly, I think we would be best off losing the "logical rep
>> worker slot" business altogether, and making do with just bgworker
>> slots.

I think that would be cleaner as compared to what we have now.

>>  The alternative is getting the postmaster involved in cleaning
>> up lrep slots as well as bgworker slots, and I'm going to resist
>> any such idea strongly.  That way madness lies, or at least an
>> unreliable postmaster --- if we need lrep slots, what other forty-seven
>> data structures are we going to need the postmaster to clean up
>> in the future?
>>
>> I haven't studied the code enough to know why it grew lrep worker
>> slots in the first place.  Maybe someone could explain?
>>
>
> I am not quite sure I understand this question, we need to store
> additional info about workers in shared memory so we need slots for that.
>

Yeah, but you could have used the way we do for parallel query where
we setup dsm and share all such information.  You can check the logic
of execparallel.c and parallel.c to see how we do all such stuff for
parallel query.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-18 Thread Amit Kapila

On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier
 wrote:
> On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila  wrote:
>> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
>>  wrote:
>>>
>>
>> You have already noticed above that it will help when
>> wal_checking_consistency is used and that was the main motivation to
>> pass REGBUF_STANDARD apart from maintaining consistency.  It is not
>> clear to me what is bothering you.  If your only worry about these
>> patches is that you want this sentence to be removed from the comment
>> because you think it is obvious or doesn't make much sense, then I
>> think we can leave this decision to committer.  I have added it based
>> on Tom's suggestion above thread about explaining why it is
>> inessential or essential to set pd_lower.  I think Amit Langote just
>> tried to mimic what I have done in hash and btree patches to maintain
>> consistency.  I am also not very sure if we should write some detailed
>> comment or leave the existing comment as it is.  I think it is just a
>> matter of different perspective.
>
> What is disturbing me a bit is that the existing comments mention
> something that could be supported (the compression of pages), but
> that's actually not done and this is unlikely to happen because the
> number of bytes associated to a meta page is going to be always
> cheaper than a FPW, which would cost in CPU to store it for
> compression is enabled. So I think that we should switch comments to
> mention that pd_lower is set so as this helps with page masking, but
> we don't take advantage of XLOG compression in the code.
>

I think that is not true because we do need FPW for certain usages of
metapage.  Consider a case in _hash_doinsert where register metabuf
with just
REGBUF_STANDARD, it can take advantage of removing the hole if
pd_lower is set to its correct position.  There are other similar
usages in hash index. For other indexes like btree, there is no such
usage currently, but it can also take advantage for
wal_consistency_checking.   Now, probably there is an argument that we
use different comments for different indexes as the usage varies, but
I think someone looking at code after reading the comments can
differentiate such cases.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-09-18 Thread Amit Kapila

On Mon, Sep 18, 2017 at 10:00 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> Attached patch fixes these problems.
>
> Hmm, this patch adds a kill(notify_pid) after one call to
> ForgetBackgroundWorker, but the postmaster has several more such calls.
> Shouldn't they all notify the notify_pid?  Should we move that
> functionality into ForgetBackgroundWorker itself, so we can't forget
> it again?
>

Among other places, we already notify during
ReportBackgroundWorkerExit().  However, it seems to me that all other
places except where this patch has added a call to notify doesn't need
such a call.  The other places like in DetermineSleepTime and
ResetBackgroundWorkerCrashTimes is called for a crashed worker which I
don't think requires notification to the backend as that backend
itself would have restarted.  The other place where we call
ForgetBackgroundWorker is in maybe_start_bgworkers when rw_terminate
is set to true which again seems to be either the case of worker crash
or when someone has explicitly asked to terminate the worker in which
case we already send a notification.

I think we need to notify the backend on start, stop and failure to
start a worker.  OTOH, if it is harmless to send a notification even
after the worker is crashed, then we can just move that functionality
into ForgetBackgroundWorker itself as that will simplify the code and
infact that is the first thing that occurred to me as well, but I
haven't done that way as I was not sure if we want to send
notification in all kind of cases.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-18 Thread Amit Kapila

On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> In this build you can see the output of the following at the end,
>> which might provide clues to the initiated.  You might need to click a
>> small triangle to unfold the commands' output.
>
>> cat ./src/test/subscription/tmp_check/log/002_types_publisher.log
>> cat ./src/test/subscription/tmp_check/log/002_types_subscriber.log
>> cat ./src/test/subscription/tmp_check/log/regress_log_002_types
>
> The subscriber log includes
> 2017-09-18 08:43:08.240 UTC [15672] WARNING:  out of background worker slots
> 2017-09-18 08:43:08.240 UTC [15672] HINT:  You might need to increase 
> max_worker_processes.
>
> Maybe that's harmless, but I'm suspicious that it's a smoking gun.
> I think perhaps this reflects a failed attempt to launch a worker,
> which the caller does not realize has failed to launch because of the
> lack of worker-fork-failure error recovery I bitched about months ago
> [1], leading to subscription startup waiting forever for a worker that's
> never going to report finishing.
>
> I see Amit K. just posted a patch in that area [2], haven't looked at it
> yet.
>

I have noticed while fixing the issue you are talking that this path
is also susceptible to such problems.  In
WaitForReplicationWorkerAttach(), it relies on
GetBackgroundWorkerPid() to know the status of the worker which won't
give the correct status in case of fork failure.  The patch I have
posted has a fix for the issue, however, this could be an entirely
different issue altogether as it appears from your next email in this
thread.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB%3D3uw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-09-18 Thread Amit Kapila

On Mon, Sep 18, 2017 at 12:46 PM, Rushabh Lathia
 wrote:
> Thanks Amit for the patch.
>
> I reviewed the code changes as well as performed more testing. Patch
> looks good to me.
>

In that case, can you please mark the patch [1] as ready for committer in CF app

> Here is the updated patch - where added test-case clean up.
>

oops, missed dropping the function.


Thanks for the review.


[1] - https://commitfest.postgresql.org/15/1293/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] parallel.c oblivion of worker-startup failures

2017-09-17 Thread Amit Kapila

Sometime back Tom Lane has reported [1] about $Subject.  I have looked
into the issue and found that the problem is not only with parallel
workers but with general background worker machinery as well in
situations where fork or some such failure occurs. The first problem
is that after we register the dynamic worker, the way to know whether
the worker has started
(WaitForBackgroundWorkerStartup/GetBackgroundWorkerPid) won't give the
right answer if the fork failure happens.  Also, in cases where the
worker is marked not to start after the crash, postmaster doesn't
notify the backend if it is not able to start the worker which can
make the backend wait forever as it is oblivion of the fact that the
worker is not started.  Now, apart from these general problems of
background worker machinery, parallel.c assumes that after it has
registered the dynamic workers, they will start and perform their
work.  We need to ensure that in case, postmaster is not able to start
parallel workers due to fork failure or any similar situations,
backend doesn't keep on waiting forever.  To fix it, before waiting
for workers to finish, we can check whether the worker exists at all.
Attached patch fixes these problems.

Another way to fix the parallel query related problem is that after
registering the workers, the master backend should wait for workers to
start before setting up different queues for them to communicate.  I
think that can be quite expensive.

Thoughts?

[1] - https://www.postgresql.org/message-id/4905.1492813...@sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_worker_startup_failures_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-17 Thread Amit Kapila

On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
 wrote:
> On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote
>  wrote:
>> On 2017/09/14 16:00, Michael Paquier wrote:
>>> On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
>>>  wrote:
>>>> Sure, no problem.
>>>
>>> OK, I took a closer look at all patches, but did not run any manual
>>> tests to test the compression except some stuff with
>>> wal_consistency_checking.
>>
>> Thanks for the review.
>>
>>> For SpGist, I think that there are two missing: spgbuild() and 
>>> spgGetCache().
>>
>> spgbuild() calls SpGistInitMetapage() before marking the metapage buffer
>> dirty.  The latter already sets pd_lower correctly, so we don't need to do
>> it explicitly in spgbuild() itself.
>
> Check.
>
>> spgGetCache() doesn't write the metapage, only reads it:
>>
>> /* Last, get the lastUsedPages data from the metapage */
>> metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
>> LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
>>
>> metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
>>
>> if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
>> elog(ERROR, "index \"%s\" is not an SP-GiST index",
>>  RelationGetRelationName(index));
>>
>> cache->lastUsedPages = metadata->lastUsedPages;
>>
>> UnlockReleaseBuffer(metabuffer);
>>
>> So, I think it won't be correct to set pd_lower here, no?
>
> Yeah, I am just reading the code again and there is no alarm here.
>
>> Updated patch attached, which implements your suggested changes to the
>> masking functions.
>>
>> By the way, as I noted on another unrelated thread, I will not be able to
>> respond to emails from tomorrow until 9/23.
>
> No problem. Enjoy your vacations.
>
> I have spent some time looking at the XLOG insert code, and tested the
> compressibility of the meta pages. And I have noticed that all pages
> registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not
> take a FPW of the block registered because the page will be
> reinitialized at replay, and in such cases the zero'ed page is filled
> with the data from the record. log_newpage is used to initialize init
> forks for unlogged relations, which is where this patch allows page
> compression to take effect correctly. Still setting REGBUF_STANDARD
> with REGBUF_WILL_INIT is actually a no-op, except if
> wal_checking_consistency is used when registering a buffer for WAL
> insertion. There is one code path though where things are useful all
> the time: revmap_physical_extend for BRIN.
>
> The patch is using the correct logic, still such comments are in my
> opinion incorrect because of the reason written above:
> +* This won't be of any help unless we use option like REGBUF_STANDARD
> +* while registering the buffer for metapage as otherwise, it won't try to
> +* remove the hole while logging the full page image.
> Here REGBUF_STANDARD is actually a no-op for btree.
>

You have already noticed above that it will help when
wal_checking_consistency is used and that was the main motivation to
pass REGBUF_STANDARD apart from maintaining consistency.  It is not
clear to me what is bothering you.  If your only worry about these
patches is that you want this sentence to be removed from the comment
because you think it is obvious or doesn't make much sense, then I
think we can leave this decision to committer.  I have added it based
on Tom's suggestion above thread about explaining why it is
inessential or essential to set pd_lower.  I think Amit Langote just
tried to mimic what I have done in hash and btree patches to maintain
consistency.  I am also not very sure if we should write some detailed
comment or leave the existing comment as it is.  I think it is just a
matter of different perspective.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-15 Thread Amit Kapila

On Thu, Sep 14, 2017 at 8:30 PM, Amit Khandekar  wrote:
> On 11 September 2017 at 18:55, Amit Kapila  wrote:
>>>
>>
>> How?  See, if you have four partial subpaths and two non-partial
>> subpaths, then for parallel-aware append it considers all six paths in
>> parallel path whereas for non-parallel-aware append it will consider
>> just four paths and that too with sub-optimal strategy.  Can you
>> please try to give me some example so that it will be clear.
>
> Suppose 4 appendrel children have costs for their cheapest partial (p)
> and non-partial paths (np)  as shown below :
>
> p1=5000  np1=100
> p2=200   np2=1000
> p3=80   np3=2000
> p4=3000  np4=50
>
> Here, following two Append paths will be generated :
>
> 1. a parallel-aware Append path with subpaths :
> np1, p2, p3, np4
>
> 2. Partial (i.e. non-parallel-aware) Append path with all partial subpaths:
> p1,p2,p3,p4
>
> Now, one thing we can do above is : Make the path#2 parallel-aware as
> well; so both Append paths would be parallel-aware.
>

Yes, we can do that and that is what I think is probably better.  So,
the question remains that in which case non-parallel-aware partial
append will be required?  Basically, it is not clear to me why after
having parallel-aware partial append we need non-parallel-aware
version?  Are you keeping it for the sake of backward-compatibility or
something like for cases if someone has disabled parallel append with
the help of new guc in this patch?

>
> So above, what I am saying is, we can't tell which of the paths #1 and
> #2 are cheaper until we calculate total cost. I didn't understand what
> did you mean by "non-parallel-aware append will consider only the
> partial subpaths and that too with sub-optimal strategy" in the above
> example. I guess, you were considering a different scenario than the
> above one.
>

Yes, something different, but I think you can ignore that as we can
discuss the guts of my point based on the example given by you above.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-15 Thread Amit Kapila

On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas  wrote:
> On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila  wrote:
>> I think the patch stores only non-partial paths in decreasing order,
>> what if partial paths having more costs follows those paths?
>
> The general picture here is that we don't want the leader to get stuck
> inside some long-running operation because then it won't be available
> to read tuples from the workers.  On the other hand, we don't want to
> just have the leader do no work because that might be slow.  And in
> most cast cases, the leader will be the first participant to arrive at
> the Append node, because of the worker startup time.  So the idea is
> that the workers should pick expensive things first, and the leader
> should pick cheap things first.
>

At a broader level, the idea is good, but I think it won't turn out
exactly like that considering your below paragraph which indicates
that it is okay if leader picks a partial path that is costly among
other partial paths as a leader won't be locked with that.
Considering this is a good design for parallel append, the question is
do we really need worker and leader to follow separate strategy for
choosing next path.  I think the patch will be simpler if we can come
up with a way for the worker and leader to use the same strategy to
pick next path to process.  How about we arrange the list of paths
such that first, all partial paths will be there and then non-partial
paths and probably both in decreasing order of cost.  Now, both leader
and worker can start from the beginning of the list. In most cases,
the leader will start at the first partial path and will only ever
need to scan non-partial path if there is no other partial path left.
This is not bulletproof as it is possible that some worker starts
before leader in which case leader might scan non-partial path before
all partial paths are finished, but I think we can avoid that as well
if we are too worried about such cases.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-09-14 Thread Amit Kapila

On Thu, Aug 31, 2017 at 11:23 AM, Amit Kapila  wrote:
> On Mon, Aug 21, 2017 at 2:40 PM, Amit Kapila  wrote:
>> On Mon, Aug 21, 2017 at 1:44 PM, Haribabu Kommi
>>  wrote:
>>>
>>>
>>> Thanks for adding more details. It is easy to understand.
>>>
>>> I marked the patch as ready for committer in the commitfest.
>>>
>
> Rebased the patch.  The output of test case added by the patch is also
> slightly changed because of the recent commit
> 7df2c1f8daeb361133ac8bdeaf59ceb0484e315a.  I have verified that the
> new test result is as expected.
>

The latest patch again needs to be rebased.  Find rebased patch
attached with this email.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-14 Thread Amit Kapila

On Thu, Sep 14, 2017 at 12:30 PM, Michael Paquier
 wrote:
> On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
>  wrote:
>> Sure, no problem.
>

>
> +*
> +* This won't be of any help unless we use option like REGBUF_STANDARD
> +* while registering the buffer for metapage as otherwise, it won't try to
> +* remove the hole while logging the full page image.
>  */
> This comment is in the btree code. But you actually add
> REGBUF_STANDARD. So I think that this could be just removed.
>
>  * Set pd_lower just past the end of the metadata.  This is not essential
> -* but it makes the page look compressible to xlog.c.
> +* but it makes the page look compressible to xlog.c.  See
> +* _bt_initmetapage.
> This reference could be removed as well as _bt_initmetapage does not
> provide any information, the existing comment is wrong without your
> patch, and then becomes right with this patch.
>

I have added this comment just to add some explanation as to why we
are setting pd_lower and what makes it useful.  We can change it or
remove it, but I am not sure what is the right thing to do here, may
be we can defer this to the committer.

>
> It seems to me that an update of pd_lower is missing in _bt_getroot(),
> just before marking the buffer as dirty I think. And there is a second
> in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally
> one in _bt_newroot().
>
>
> For hash, hashbulkdelete(), _hash_vacuum_one_page(),
> _hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are
> missing the shot, no? We could have a meta page of a hash index
> upgraded from PG10.
>

Why do we need to change metapage at every place for btree or hash?
Any index that is upgraded should have pd_lower set, do you have any
case in mind where it won't be set?  For hash, if someone upgrades
from a version lower than 9.6, it might not have set, but we already
give warning to reindex the hash indexes upgraded from a version lower
than 10.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-09-13 Thread Amit Kapila

On Wed, Sep 13, 2017 at 5:30 PM, Rushabh Lathia
 wrote:
> On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila 
> wrote:
>>
>
>
> This seems like a good optimization. I tried to simulate the test given
> in the mail, initially wasn't able to generate the exact test - as index
> creation is missing in the test shared.
>

Oops.

> I also won't consider this as bug, but its definitely good optimization
> for GatherMerge.
>
>>
>>
>> Note - If we agree on the problems and fix, then I can add regression
>> tests to cover above cases in the patch.
>
>
> Sure, once you do that - I will review the patch.
>

The attached patch contains regression test as well.

Thanks for looking into it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pushdown_target_gathermerge_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-13 Thread Amit Kapila

On Wed, Sep 13, 2017 at 9:39 AM, Amit Kapila  wrote:
> On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar  
> wrote:
>> On 5 September 2017 at 14:04, Amit Kapila  wrote:
>>
>> I started with a quick review ... a couple of comments below :
>>
>> - * If this is a baserel, consider gathering any partial paths we may have
>> - * created for it.  (If we tried to gather inheritance children, we could
>> + * If this is a baserel and not the only rel, consider gathering any
>> + * partial paths we may have created for it.  (If we tried to gather
>>
>>   /* Create GatherPaths for any useful partial paths for rel */
>> -  generate_gather_paths(root, rel);
>> +  if (lev < levels_needed)
>> + generate_gather_paths(root, rel, NULL);
>>
>> I think at the above two places, and may be in other place also, it's
>> better to mention the reason why we should generate the gather path
>> only if it's not the only rel.
>>
>
> I think the comment you are looking is present where we are calling
> generate_gather_paths in grouping_planner. Instead of adding same or
> similar comment at multiple places, how about if we just say something
> like "See in grouping_planner where we generate gather paths" at all
> other places?
>
>> --
>>
>> -   if (rel->reloptkind == RELOPT_BASEREL)
>> -   generate_gather_paths(root, rel);
>> +   if (rel->reloptkind == RELOPT_BASEREL &&
>> root->simple_rel_array_size > 2)
>> +   generate_gather_paths(root, rel, NULL);
>>
>> Above, in case it's a partitioned table, root->simple_rel_array_size
>> includes the child rels. So even if it's a simple select without a
>> join rel, simple_rel_array_size would be > 2, and so gather path would
>> be generated here for the root table, and again in grouping_planner().
>>
>
> Yeah, that could be a problem.  I think we should ensure that there is
> no append rel list by checking root->append_rel_list.  Can you think
> of a better way to handle it?
>

The attached patch fixes both the review comments as discussed above.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_paths_include_tlist_cost_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-12 Thread Amit Kapila

On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar  wrote:
> On 5 September 2017 at 14:04, Amit Kapila  wrote:
>
> I started with a quick review ... a couple of comments below :
>
> - * If this is a baserel, consider gathering any partial paths we may have
> - * created for it.  (If we tried to gather inheritance children, we could
> + * If this is a baserel and not the only rel, consider gathering any
> + * partial paths we may have created for it.  (If we tried to gather
>
>   /* Create GatherPaths for any useful partial paths for rel */
> -  generate_gather_paths(root, rel);
> +  if (lev < levels_needed)
> + generate_gather_paths(root, rel, NULL);
>
> I think at the above two places, and may be in other place also, it's
> better to mention the reason why we should generate the gather path
> only if it's not the only rel.
>

I think the comment you are looking is present where we are calling
generate_gather_paths in grouping_planner. Instead of adding same or
similar comment at multiple places, how about if we just say something
like "See in grouping_planner where we generate gather paths" at all
other places?

> --
>
> -   if (rel->reloptkind == RELOPT_BASEREL)
> -   generate_gather_paths(root, rel);
> +   if (rel->reloptkind == RELOPT_BASEREL &&
> root->simple_rel_array_size > 2)
> +   generate_gather_paths(root, rel, NULL);
>
> Above, in case it's a partitioned table, root->simple_rel_array_size
> includes the child rels. So even if it's a simple select without a
> join rel, simple_rel_array_size would be > 2, and so gather path would
> be generated here for the root table, and again in grouping_planner().
>

Yeah, that could be a problem.  I think we should ensure that there is
no append rel list by checking root->append_rel_list.  Can you think
of a better way to handle it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-12 Thread Amit Kapila

On Tue, Sep 12, 2017 at 3:51 PM, Amit Langote
 wrote:
> On 2017/09/11 18:13, Michael Paquier wrote:
>> On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote wrote:
>>> On 2017/09/10 15:22, Michael Paquier wrote:
>>>> Coordinating efforts here would be nice. If you, Amit K, are taking
>>>> care of a patch for btree and hash, would you, Amit L, write the part
>>>> for GIN, BRIN and SpGist? This needs a careful lookup as many code
>>>> paths need a lookup so it may take time. Please note that I don't mind
>>>> double-checking this part if you don't have enough room to do so.
>>>
>>> Sorry, I didn't have time today to carefully go through the recent
>>> discussion on this thread (starting with Tom's email wherein he said he
>>> set the status of the patch to Waiting on Author).  I will try tomorrow.
>>
>> Thanks for the update! Once you get to this point, please let me know
>> if you would like to work on a more complete patch for brin, gin and
>> spgist. If you don't have enough room, I am fine to produce something.
>
> I updated the patches for GIN, BRIN, and SP-GiST to include the following
> changes:
>
> 1. Pass REGBUF_STNADARD flag when registering the metapage buffer
>

I have looked into brin patch and it seems you have not considered all
usages of meta page.  The structure BrinRevmap also contains a
reference to meta page buffer and when that is modified (ex. in
revmap_physical_extend), then also I think you need to consider using
REGBUF_STNADARD flag.

>
> Did I miss something from the discussion?
>

I think one point which might be missed is that the patch needs to
modify pd_lower for all usages of metapage, not only when it is first
time initialized.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-09-11 Thread Amit Kapila

On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila  wrote:
> During my recent work on costing of parallel paths [1], I noticed that
> we are missing to push target list below GatherMerge in some simple
> cases like below.
>

I think this should be considered as a bug-fix for 10.0, but it
doesn't block any functionality or give wrong results, so we might
consider it as an optimization for GatherMerge.   In any case, I have
added this to next CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-09-11 Thread Amit Kapila

On Mon, Sep 11, 2017 at 4:49 PM, Amit Khandekar  wrote:
> On 8 September 2017 at 19:17, Amit Kapila  wrote:
>>
>> In that case, why can't we keep the workers also process in same
>> order, what is the harm in that?
>
> Because of the way the logic of queuing works, the workers finish
> earlier if they start with expensive plans first. For e.g. : with 3
> plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will
> finish in 8 time units (w1 will finish plan 1 in 8, while in parallel
> w2 will finish the remaining 2 plans in 8 units.  Whereas if the plans
> are ordered like : 4, 4, 8, then the workers will finish in 12 time
> units (w1 and w2 will finish each of the 1st two plans in 4 units, and
> then w1 or w2 will take up plan 3 and finish in 8 units, while the
> other worker remains idle).
>

I think the patch stores only non-partial paths in decreasing order,
what if partial paths having more costs follows those paths?

>
>>
>> Few more comments:
>>
>> 1.
>> + else if (IsA(subpath, MergeAppendPath))
>> + {
>> + MergeAppendPath *mpath = (MergeAppendPath *) subpath;
>> +
>> + /*
>> + * If at all MergeAppend is partial, all its child plans have to be
>> + * partial : we don't currently support a mix of partial and
>> + * non-partial MergeAppend subpaths.
>> + */
>> + if (is_partial)
>> + return list_concat(partial_subpaths, list_copy(mpath->subpaths));
>>
>> In which situation partial MergeAppendPath is generated?  Can you
>> provide one example of such path?
>
> Actually currently we don't support partial paths for MergeAppendPath.
> That code just has that if condition (is_partial) but currently that
> condition won't be true for MergeAppendPath.
>

I think then it is better to have Assert for MergeAppend.  It is
generally not a good idea to add code which we can never hit.

>>
>> 2.
>> add_paths_to_append_rel()
..
>>
>> I think it might be better to add a sentence why we choose a different
>> way to decide a number of workers in the second case
>> (non-parallel-aware append).
>
> Yes, I agree. Will do that after we conclude with your next point below ...
>
>> Do you think non-parallel-aware Append
>> will be better in any case when there is a parallel-aware append?  I
>> mean to say let's try to create non-parallel-aware append only when
>> parallel-aware append is not possible.
>
> By non-parallel-aware append, I am assuming you meant  partial
> non-parallel-aware Append. Yes, if the parallel-aware Append path has
> *all* partial subpaths chosen, then we do omit a partial non-parallel
> Append path, as seen in this code in the patch :
>
> /*
> * Consider non-parallel partial append path. But if the parallel append
> * path is made out of all partial subpaths, don't create another partial
> * path; we will keep only the parallel append path in that case.
> */
> if (partial_subpaths_valid && !pa_all_partial_subpaths)
> {
> ..
> }
>
> But if the parallel-Append path has a mix of partial and non-partial
> subpaths, then we can't really tell which of the two could be cheapest
> until we calculate the cost. It can be that the non-parallel-aware
> partial Append can be cheaper as well.
>

How?  See, if you have four partial subpaths and two non-partial
subpaths, then for parallel-aware append it considers all six paths in
parallel path whereas for non-parallel-aware append it will consider
just four paths and that too with sub-optimal strategy.  Can you
please try to give me some example so that it will be clear.

>>
>> 4.
>> -  select count(*) from a_star;
>> -select count(*) from a_star;
>> +  select round(avg(aa)), sum(aa) from a_star;
>> +select round(avg(aa)), sum(aa) from a_star;
>>
>> Why you have changed the existing test. It seems count(*) will also
>> give what you are expecting.
>
> Needed to do cover some data testing with Parallel Append execution.
>

Okay.

One more thing, I think you might want to update comment atop
add_paths_to_append_rel to reflect the new type of parallel-aware
append path being generated. In particular, I am referring to below
part of the comment:

 * Similarly it collects partial paths from
 * non-dummy children to create partial append paths.
 */
static void
add_paths_to_append_rel()


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-11 Thread Amit Kapila

On Mon, Sep 11, 2017 at 12:43 PM, Michael Paquier
 wrote:
> On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila  wrote:
>> I have prepared separate patches for hash and btree index.  I think
>> for another type of indexes, it is better to first fix the pd_lower
>> issue.
>
> Just wondering (sorry I have not looked at your patch in details)...
> Have you tested the compressibility effects of this patch on FPWs with
> and without wal_compression?
>

I have debugged it to see that it is executing the code path to
eliminate the hole for the hash index.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3368 matches

Mail list logo