Re: [HACKERS] A GUC to prevent leader processes from running subplans?
On Sun, Nov 12, 2017 at 9:18 AM, Thomas Munro wrote: > On Sat, Oct 21, 2017 at 8:09 AM, Robert Haas wrote: >> On Tue, Oct 17, 2017 at 7:27 AM, Thomas Munro >> wrote: >> >> I don't think overloading force_parallel_mode is a good idea, but >> having some other GUC for this seems OK to me. Not sure I like >> multiplex_gather, though. > > How about parallel_leader_participation = on|off? The attached > version has it that way, and adds regression tests to exercise on, off > and off-but-couldn't-start-any-workers for both kinds of gather node. > > I'm not sure why node->need_to_rescan is initialised by both > ExecGatherInit() and ExecGather(). Only the latter's value matters, > right? > I don't see anything like need_to_rescan in the GatherState node. Do you intend to say need_to_scan_locally? If yes, then I think whatever you said is right. > I've added this to the January Commitfest. > +1 to this idea. Do you think such an option at table level can be meaningful? We have a parallel_workers as a storage option for tables, so users might want leader to participate in parallelism only for some of the tables. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GatherMerge misses to push target list
On Sat, Nov 11, 2017 at 5:05 PM, Robert Haas wrote: > On Mon, Sep 18, 2017 at 7:02 AM, Rushabh Lathia > wrote: >>> In that case, can you please mark the patch [1] as ready for committer in >>> CF app >> >> Done. > > I think this patch is mostly correct, but I think the change to > planner.c isn't quite right. ordered_rel->reltarget is just a dummy > target list that produces nothing. Instead, I think we should pass > path->pathtarget, representing the idea that whatever Gather Merge > produces as output is the same as what you put into it. > Agreed. Your change looks good to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Sat, Nov 11, 2017 at 12:15 AM, Robert Haas wrote: > On Tue, Nov 7, 2017 at 4:45 AM, Amit Kapila wrote: >> As mentioned, changed the status of the patch in CF app. > > I spent some time reviewing this patch today and found myself still > quite uncomfortable with the fact that it was adding execution-time > work to track the types of parameters - types that would usually not > even be used. I found the changes to nodeNestLoop.c to be > particularly objectionable, because we could end up doing the work > over and over when it is actually not needed at all, or at most once. > That's right, but we are just accessing tuple descriptor to get the type, there shouldn't be much work involved in that. However, I think your approach has a merit that we don't need to even do that during execution time. > I decided to try instead teaching the planner to keep track of the > types of PARAM_EXEC parameters as they were created, and that seems to > work fine. See 0001, attached. > This looks good to me. > 0002, attached, is my worked-over version of the rest of the patch. I > moved the code that serializes and deserializes PARAM_EXEC from > nodeSubplan.c -- which seemed like a strange choice - to > execParallel.c. > I have tried to follow the practice we have used for param extern params (SerializeParamList is in params.c) and most of the handling of initplans is done in nodeSubplan.c, so I choose to keep the newly added functions there. However, I think keeping it in execParallel.c is also okay as we do it for serialize plan. > I removed the type OID from the serialization format > because there's no reason to do that any more; the worker already > knows the types from the plan. I did some renaming of the functions > involved and some adjustment of the comments to refer to "PARAM_EXEC > parameters" instead of initPlan parameters, because there's no reason > that I can see why this can only work for initPlans. A Gather node on > the inner side of a nested loop doesn't sound like a great idea, but I > think this infrastructure could handle it (though it would need some > more planner work). > I think it would need some work in execution as well because the size won't be fixed in that case for varchar type of params. We might end up with something different as well. > I broke a lot of long lines in your version of > the patch into multiple lines; please try to be attentive to this > issue when writing patches in general, as it is a bit tedious to go > through and insert line breaks in many places. > Okay, but I sometimes rely on pgindent for such things as for few things it becomes difficult to decide which way it will be better. > Please let me know your thoughts on the attached patches. > Few minor comments: 1. --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -23,6 +23,7 @@ #include "postgres.h" +#include "executor/execExpr.h" #include "executor/execParallel.h" #include "executor/executor.h" #include "executor/nodeBitmapHeapscan.h" @@ -31,6 +32,7 @@ #include "executor/nodeIndexscan.h" #include "executor/nodeIndexonlyscan.h" #include "executor/nodeSeqscan.h" +#include "executor/nodeSubplan.h" This is not required if we move serialize and other functions to execParallel.c 2. +set_param_references(PlannerInfo *root, Plan *plan) +{ + Assert(IsA(plan, Gather) ||IsA(plan, GatherMerge)); I think there should be a space after || operator. 3. +/* + * Serialize ParamExecData params corresponding to initplans. + * .. +/* + * Restore ParamExecData params corresponding to initplans. + */ Shouldn't we change the reference to initplans here as well? I have fixed the first two in attached patch and left the last one as I was not sure what you have in mind -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com 0002-pq-pushdown-initplan-rebased-1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Fri, Nov 10, 2017 at 9:48 AM, Amit Kapila wrote: > On Fri, Nov 10, 2017 at 8:36 AM, Robert Haas wrote: >> On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila wrote: >>> Have you set force_parallel_mode=regress; before running the >>> statement? >> >> Yes, I tried that first. >> >>> If so, then why you need to tune other parallel query >>> related parameters? >> >> Because I couldn't get it to break the other way, I then tried this. >> >> Instead of asking me what I did, can you tell me what I need to do? >> Maybe a self-contained reproducible test case including exactly what >> goes wrong on your end? >> > > I think we are missing something very basic because you should see the > failure by executing that statement in force_parallel_mode=regress > even in a freshly created database. > I am seeing the assertion failure as below on executing the above mentioned Create statement: TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File: "heapam.c", Line: 2634) server closed the connection unexpectedly This probably means the server terminated abnormally -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Fri, Nov 10, 2017 at 8:36 AM, Robert Haas wrote: > On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila wrote: >> Have you set force_parallel_mode=regress; before running the >> statement? > > Yes, I tried that first. > >> If so, then why you need to tune other parallel query >> related parameters? > > Because I couldn't get it to break the other way, I then tried this. > > Instead of asking me what I did, can you tell me what I need to do? > Maybe a self-contained reproducible test case including exactly what > goes wrong on your end? > I think we are missing something very basic because you should see the failure by executing that statement in force_parallel_mode=regress even in a freshly created database. I guess the missing point is that I am using assertions enabled build and probably you are not (If this is the reason, then it should have striked me first time). Anyway, I am writing steps to reproduce the issue. 1. initdb 2. start server 3. connect using psql 4. set force_parallel_mode=regress; 5. Create Table as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Fri, Nov 10, 2017 at 12:05 AM, Robert Haas wrote: > On Thu, Nov 9, 2017 at 12:08 AM, Amit Kapila wrote: >> This change looks suspicious to me. I think here we can't use the >> tupDesc constructed from targetlist. One problem, I could see is that >> the check for hasOid setting in tlist_matches_tupdesc won't give the >> correct answer. In case of the scan, we use the tuple descriptor >> stored in relation descriptor which will allow us to take the right >> decision in tlist_matches_tupdesc. If you try the statement CREATE >> TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in >> force_parallel_mode=regress, then you can reproduce the problem I am >> trying to highlight. > > I tried this, but nothing seemed to be obviously broken. Then I > realized that the CREATE TABLE command wasn't using parallelism, so I > retried with parallel_setup_cost = 0, parallel_tuple_cost = 0, and > min_parallel_table_scan_size = 0. That got it to use parallel query, > but I still don't see anything broken. Can you clarify further? > Have you set force_parallel_mode=regress; before running the statement? If so, then why you need to tune other parallel query related parameters? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Runtime Partition Pruning
On Thu, Nov 9, 2017 at 9:01 PM, Robert Haas wrote: > On Thu, Nov 9, 2017 at 6:18 AM, Beena Emerson wrote: >> The code still chooses the custom plan instead of the generic plan for >> the prepared statements. I am working on it. > > I don't think it's really the job of this patch to do anything about > that problem. > +1. I think if we really want to do something about plan choice when partitions are involved that should be done as a separate patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Nov 8, 2017 at 6:48 PM, Robert Haas wrote: > On Wed, Nov 8, 2017 at 7:26 AM, Amit Kapila wrote: >> We do want to generate it later when there isn't inheritance involved, >> but only if there is a single rel involved (simple_rel_array_size >> <=2). The rule is something like this, we will generate the gather >> paths at this stage only if there are more than two rels involved and >> there isn't inheritance involved. > > Why is that the correct rule? > > Sorry if I'm being dense here. I would have thought we'd want to skip > it for the topmost scan/join rel regardless of the presence or absence > of inheritance. > I think I understood your concern after some offlist discussion and it is primarily due to the inheritance related check which can skip the generation of gather paths when it shouldn't. So what might fit better here is a straight check on the number of base rels such that allow generating gather path in set_rel_pathlist, if there are multiple baserels involved. I have used all_baserels which I think will work better for this purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel_paths_include_tlist_cost_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Wed, Nov 8, 2017 at 1:02 AM, Andres Freund wrote: > Hi, > > On 2017-11-06 10:56:43 +0530, Amit Kapila wrote: >> On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund wrote >> > On 2017-11-05 01:05:59 +0100, Robert Haas wrote: >> >> skip-gather-project-v1.patch does what it says on the tin. I still >> >> don't have a test case for this, and I didn't find that it helped very >> >> much, >> >> I am also wondering in which case it can help and I can't think of the >> case. > > I'm confused? Isn't it fairly obvious that unnecessarily projecting > at the gather node is wasteful? Obviously depending on the query you'll > see smaller / bigger gains, but that there's beenfdits should be fairly > obvious? > > I agree that there could be benefits depending on the statement. I initially thought that we are kind of re-evaluating the expressions in target list as part of projection even if worker backend has already done that, but that was not the case and instead, we are deforming the tuples sent by workers. Now, I think as a general principle it is a good idea to delay the deforming as much as possible. About the patch, /* - * Initialize result tuple type and projection info. - */ - ExecAssignResultTypeFromTL(&gatherstate->ps); - ExecAssignProjectionInfo(&gatherstate->ps, NULL); - - /* * Initialize funnel slot to same tuple descriptor as outer plan. */ if (!ExecContextForcesOids(&gatherstate->ps, &hasoid)) @@ -115,6 +109,12 @@ ExecInitGather(Gather *node, EState *estate, int eflags) tupDesc = ExecTypeFromTL(outerNode->targetlist, hasoid); ExecSetSlotDescriptor(gatherstate->funnel_slot, tupDesc); + /* + * Initialize result tuple type and projection info. + */ + ExecAssignResultTypeFromTL(&gatherstate->ps); + ExecConditionalAssignProjectionInfo(&gatherstate->ps, tupDesc, OUTER_VAR); + This change looks suspicious to me. I think here we can't use the tupDesc constructed from targetlist. One problem, I could see is that the check for hasOid setting in tlist_matches_tupdesc won't give the correct answer. In case of the scan, we use the tuple descriptor stored in relation descriptor which will allow us to take the right decision in tlist_matches_tupdesc. If you try the statement CREATE TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in force_parallel_mode=regress, then you can reproduce the problem I am trying to highlight. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Nov 8, 2017 at 4:34 PM, Robert Haas wrote: > On Tue, Nov 7, 2017 at 9:41 PM, Amit Kapila wrote: >> This is required to prohibit generating gather path for top rel in >> case of inheritence (Append node) at this place (we want to generate >> it later when scan/join target is available). > > OK, but why don't we want to generate it later when there isn't > inheritance involved? > We do want to generate it later when there isn't inheritance involved, but only if there is a single rel involved (simple_rel_array_size <=2). The rule is something like this, we will generate the gather paths at this stage only if there are more than two rels involved and there isn't inheritance involved. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Nov 8, 2017 at 2:51 AM, Robert Haas wrote: > On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila wrote: > >>> Also, even if inheritance is used, we might still be the >>> topmost scan/join target. >> >> Sure, but in that case, it won't generate the gather path here (due to >> this part of check "!root->append_rel_list"). I am not sure whether I >> have understood the second part of your question, so if my answer >> appears inadequate, then can you provide more details on what you are >> concerned about? > > I don't know why the question of why root->append_rel_list is empty is > the relevant thing here for deciding whether to generate gather paths > at this point. > This is required to prohibit generating gather path for top rel in case of inheritence (Append node) at this place (we want to generate it later when scan/join target is available). For such a case, the reloptkind will be RELOPT_BASEREL and simple_rel_array_size will be greater than two as it includes child rels as well. So, the check for root->append_rel_list will prohibit generating gather path for such a rel. Now, for all the child rels of Append, the reloptkind will be RELOPT_OTHER_MEMBER_REL, so it won't generate gather path here because of the first part of check (rel->reloptkind == RELOPT_BASEREL). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila wrote: > On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas wrote: >> On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila wrote: >>> How about always returning false for PARAM_EXTERN? >> >> Yeah, I think that's what we should do. Let's do that first as a >> separate patch, which we might even want to back-patch, then return to >> this. >> > > Now that the PARAM_EXTERN issue is fixed, I have rebased this patch. > This patch had been switched to Ready For Committer in last CF, then > Robert had comments which I have addressed, so I think the status > should be switched back to Ready For committer. > As mentioned, changed the status of the patch in CF app. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Plans and Cost of non-filter functions
On Mon, Nov 6, 2017 at 7:40 PM, Paul Ramsey wrote: > From my perspective, this is much much better. For sufficiently large > tables, I get parallel behaviour without jimmying with the defaults on > parallel_setup_cost and parallel_tuple_cost. *And*, the parallel behaviour > *is* sensitive to the costs of functions in target lists, so reasonably > chosen costs will flip us into a parallel mode for expensive functions > against smaller tables too. > Thanks for the confirmation. > Hopefully some variant of this finds it's way into core! Is there any way I > can productively help? You have already helped a lot by providing the use case, but feel free to ping on that thread if you find it is not moving. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Mon, Nov 6, 2017 at 7:05 PM, Robert Haas wrote: > On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila wrote: >> On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas wrote: >>> This looks like it's on the right track to me. I hope Tom will look >>> into it, but if he doesn't I may try to get it committed myself. >>> >>> -if (rel->reloptkind == RELOPT_BASEREL) >>> -generate_gather_paths(root, rel); >>> +if (rel->reloptkind == RELOPT_BASEREL && >>> +root->simple_rel_array_size > 2 && >>> +!root->append_rel_list) >>> >>> This test doesn't look correct to me. Actually, it doesn't look >>> anywhere close to correct to me. So, one of us is very confused... >>> not sure whether it's you or me. >>> >> It is quite possible that I haven't got it right, but it shouldn't be >> completely bogus as it stands the regression tests and some manual >> verification. Can you explain what is your concern about this test? > > Well, I suppose that test will fire for a baserel when the total > number of baserels is at least 3 and there's no inheritance involved. > But if there are 2 baserels, we're still not the topmost scan/join > target. > No, if there are 2 baserels, then simple_rel_array_size will be 3. The simple_rel_array_size is always the "number of relations" plus "one". See setup_simple_rel_arrays. > Also, even if inheritance is used, we might still be the > topmost scan/join target. > Sure, but in that case, it won't generate the gather path here (due to this part of check "!root->append_rel_list"). I am not sure whether I have understood the second part of your question, so if my answer appears inadequate, then can you provide more details on what you are concerned about? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas wrote: > On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila wrote: >> Thanks for the confirmation. Find rebased patch attached. > > This looks like it's on the right track to me. I hope Tom will look > into it, but if he doesn't I may try to get it committed myself. > > -if (rel->reloptkind == RELOPT_BASEREL) > -generate_gather_paths(root, rel); > +if (rel->reloptkind == RELOPT_BASEREL && > +root->simple_rel_array_size > 2 && > +!root->append_rel_list) > > This test doesn't look correct to me. Actually, it doesn't look > anywhere close to correct to me. So, one of us is very confused... > not sure whether it's you or me. > It is quite possible that I haven't got it right, but it shouldn't be completely bogus as it stands the regression tests and some manual verification. Can you explain what is your concern about this test? > simple_gather_path = (Path *) > create_gather_path(root, rel, cheapest_partial_path, rel->reltarget, > NULL, NULL); > + > +/* Add projection step if needed */ > +if (target && simple_gather_path->pathtarget != target) > +simple_gather_path = apply_projection_to_path(root, rel, > simple_gather_path, target); > > Instead of using apply_projection_to_path, why not pass the correct > reltarget to create_gather_path? > We need to push it to gather's subpath as is done in apply_projection_to_path and then we have to cost it accordingly. I think if we don't use apply_projection_to_path then we might end up with much of the code similar to it in generate_gather_paths. In fact, I have tried something similar to what you are suggesting in the first version of patch [1] and it didn't turn out to be clean. Also, I think we already do something similar in create_ordered_paths. > +/* Set or update cheapest_total_path and related fields */ > +set_cheapest(current_rel); > > I wonder if it's really OK to call set_cheapest() a second time for > the same relation... > I think if we want we can avoid it by checking whether we have generated any gather path for the relation (basically, check if it has partial path list). Another idea could be that we consider the generation of gather/gathermerge for top-level scan/join relation as a separate step and generate a new kind of upper rel for it which will be a mostly dummy but will have paths for gather/gathermerge. [1] - https://www.postgresql.org/message-id/CAA4eK1JUvL9WS9z%3D5hjSuSMNCo3TdBxFa0pA%3DE__E%3Dp6iUffUQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Sun, Nov 5, 2017 at 6:54 AM, Andres Freund wrote > On 2017-11-05 01:05:59 +0100, Robert Haas wrote: >> skip-gather-project-v1.patch does what it says on the tin. I still >> don't have a test case for this, and I didn't find that it helped very >> much, I am also wondering in which case it can help and I can't think of the case. Basically, as part of projection in the gather, I think we are just deforming the tuple which we anyway need to perform before sending the tuple to the client (printtup) or probably at the upper level of the node. >> and you said this looked like a big bottleneck in your >> testing, so here you go. > Is it possible that it shows the bottleneck only for 'explain analyze' statement as we don't deform the tuple for that at a later stage? > The query where that showed a big benefit was > > SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1; > > (i.e a not very selective filter, and then just throwing the results away) > > still shows quite massive benefits: > > before: > set parallel_setup_cost=0;set parallel_tuple_cost=0;set > min_parallel_table_scan_size=0;set max_parallel_workers_per_gather=8; > tpch_5[17938][1]=# explain analyze SELECT * FROM lineitem WHERE l_suppkey > > '5012' OFFSET 10 LIMIT 1; > ┌ > │ QUERY PLAN > ├ > │ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual > time=8675.097..8675.097 rows=0 loops=1) > │ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual > time=0.289..7904.849 rows=26989780 loops=1) > │ Workers Planned: 8 > │ Workers Launched: 7 > │ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67 > rows=3375405 width=127) (actual time=0.124..528.667 rows=3373722 loops=8) > │ Filter: (l_suppkey > 5012) > │ Rows Removed by Filter: 376252 > │ Planning time: 0.098 ms > │ Execution time: 8676.125 ms > └ > (9 rows) > after: > tpch_5[19754][1]=# EXPLAIN ANALYZE SELECT * FROM lineitem WHERE l_suppkey > > '5012' OFFSET 10 LIMIT 1; > ┌ > │ QUERY PLAN > ├ > │ Limit (cost=635802.67..635802.69 rows=1 width=127) (actual > time=5984.916..5984.916 rows=0 loops=1) > │ -> Gather (cost=0.00..635802.67 rows=27003243 width=127) (actual > time=0.214..5123.238 rows=26989780 loops=1) > │ Workers Planned: 8 > │ Workers Launched: 7 > │ -> Parallel Seq Scan on lineitem (cost=0.00..635802.67 > rows=3375405 width=127) (actual time=0.025..649.887 rows=3373722 loops=8) > │ Filter: (l_suppkey > 5012) > │ Rows Removed by Filter: 376252 > │ Planning time: 0.076 ms > │ Execution time: 5986.171 ms > └──────── > (9 rows) > > so there clearly is still benefit (this is scale 100, but that shouldn't > make much of a difference). > Do you see the benefit if the query is executed without using Explain Analyze? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Sat, Nov 4, 2017 at 2:03 AM, Tom Lane wrote: > Michael Paquier writes: >> On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila wrote: >>> On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane wrote: >>>> I've marked the CF entry closed. However, I'm not sure if we're quite >>>> done with this thread. Weren't we going to adjust nbtree and hash to >>>> be more aggressive about labeling their metapages as REGBUF_STANDARD? > >>> I have already posted the patches [1] for the same in this thread and >>> those are reviewed [2][3] as well. I have adjusted the comments as per >>> latest commit. Please find updated patches attached. > >> Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set, >> at least for page masking. > > Thanks, I'd forgotten those patches were already posted. Looks good, > so pushed. > > Looking around, I noted that contrib/bloom also had the disease of > not telling log_newpage it was writing a standard-format metapage, > so I fixed that too. > Thanks, Michael and Tom for reviewing and committing the work. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Plans and Cost of non-filter functions
On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane wrote: > Paul Ramsey writes: >>> Whether I get a parallel aggregate seems entirely determined by the number >>> of rows, not the cost of preparing those rows. > >> This is true, as far as I can tell and unfortunate. Feeding tables with >> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no >> matter how costly the work going on within. That's true of changing costs >> on the subquery select list, and on the aggregate transfn. > > This sounds like it might be the same issue being discussed in > > https://www.postgresql.org/message-id/flat/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com > I have rebased the patch being discussed on that thread. Paul, you might want to once check with the recent patch [1] posted on the thread mentioned by Tom. [1] - https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Thu, Sep 21, 2017 at 2:35 AM, Jeff Janes wrote: > On Tue, Sep 19, 2017 at 9:15 PM, Amit Kapila > wrote: >> >> On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes wrote: >> > On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro >> > wrote: >> >> >> >> On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila >> >> wrote: >> >> > The attached patch fixes both the review comments as discussed above. >> > >> > >> > that should be fixed by turning costs on the explain, as is the >> > tradition. >> > >> >> Right. BTW, did you get a chance to run the original test (for which >> you have reported the problem) with this patch? > > > Yes, this patch makes it use a parallel scan, with great improvement. > Thanks for the confirmation. Find rebased patch attached. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel_paths_include_tlist_cost_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Mon, Oct 30, 2017 at 10:07 AM, Robert Haas wrote: > On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila wrote: >> Now that the PARAM_EXTERN issue is fixed, I have rebased this patch. >> This patch had been switched to Ready For Committer in last CF, then >> Robert had comments which I have addressed, so I think the status >> should be switched back to Ready For committer. Let me know if you >> think it should be switched to some other status. > > The change to ExplainPrintPlan doesn't look good to me, because it > actually moves the initPlan; I don't think it's good for EXPLAIN to > mutate the plan state tree. It should find a way to display the > results *as if* the initPlans were attached to the subnode, but > without actually moving them. > Actually, with the latest patch, we don't need these changes in ExplainPrintPlan. Earlier, we need these changes because the patch had changed SS_charge_for_initplans to mark the path with initplan as parallel safe. However, after removing that change in the previous version of patch [1], this is not required as now we won't add gather on top plan node having initplan. [1] - https://www.postgresql.org/message-id/CAA4eK1JD%3DpJYBn8rN5RimiEVtPJmVNmyq5p6VoZBnUw2xRYB7w%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_initplan_v14.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane wrote: > Amit Langote writes: >> On 2017/09/26 16:30, Michael Paquier wrote: >>> Cool, let's switch it back to a ready for committer status then. > >> Sure, thanks. > > Pushed with some cosmetic adjustments --- mostly, making the comments more > explicit about why we need the apparently-redundant assignments to > pd_lower. > > I've marked the CF entry closed. However, I'm not sure if we're quite > done with this thread. Weren't we going to adjust nbtree and hash to > be more aggressive about labeling their metapages as REGBUF_STANDARD? > I have already posted the patches [1] for the same in this thread and those are reviewed [2][3] as well. I have adjusted the comments as per latest commit. Please find updated patches attached. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BTKg6ZK%3DmF14x_wf2KrmOxoMJ6z7YUK3-78acaYLwQ8Q%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAB7nPqTE4-GCaLtDh%3DJBcgUKR6B5WkvRLC-NpOqkgybi4FhHPw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAB7nPqQBtDW43ABnWEdoHP6A2ToedzDFdpykbGjpO2wuZNiQnw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com change_metapage_usage_btree-v3.patch Description: Binary data change_metapage_usage_hash-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel worker error
On Mon, Oct 30, 2017 at 10:04 AM, Robert Haas wrote: > On Mon, Oct 30, 2017 at 8:25 AM, Amit Kapila wrote: >> Thanks. I have closed this entry in CF app, however, I am not sure >> what is the best way to deal with the entry present in PostgreSQL 10 >> Open Items page[1]. The entry for this bug seems to be present in >> Older Bugs section. Shall we leave it as it is or do we want to do >> something else with it? >> >> [1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items > > How about just adding an additional bullet point for that item that > says "fixed by commit blah blah for 10.1"? > Sounds good, so updated the Open items page accordingly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Wed, Oct 18, 2017 at 2:06 PM, tushar wrote: > On 10/11/2017 12:42 PM, tushar wrote: >> >> On 10/09/2017 03:26 PM, Amit Kapila wrote: >>> >>> I have reverted the check >>> in the attached patch. >> >> >> I have applied this patch against PG HEAD and run sqlsmith and analyzed >> results . didn't find any specific failures against this patch. >> > I did some testing of this feature and written few testcases . PFA the sql > file(along with the expected .out file) . > Thanks a lot Tushar for testing this patch. In the latest patch, I have just rebased some comments, there is no code change, so I don't expect any change in behavior, but feel free to test it once again. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas wrote: > On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila wrote: >> How about always returning false for PARAM_EXTERN? > > Yeah, I think that's what we should do. Let's do that first as a > separate patch, which we might even want to back-patch, then return to > this. > Now that the PARAM_EXTERN issue is fixed, I have rebased this patch. This patch had been switched to Ready For Committer in last CF, then Robert had comments which I have addressed, so I think the status should be switched back to Ready For committer. Let me know if you think it should be switched to some other status. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_initplan_v13.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel worker error
On Sun, Oct 29, 2017 at 1:15 PM, Robert Haas wrote: > On Sun, Oct 29, 2017 at 12:02 PM, Amit Kapila wrote: >> This patch no longer applies, so attached rebased patches. I have >> also created patches for v10 as we might want to backpatch the fix. >> Added the patch in CF (https://commitfest.postgresql.org/15/1342/) > > Thanks. I picked the second variant, committed, and also back-patched to 9.6. > Thanks. I have closed this entry in CF app, however, I am not sure what is the best way to deal with the entry present in PostgreSQL 10 Open Items page[1]. The entry for this bug seems to be present in Older Bugs section. Shall we leave it as it is or do we want to do something else with it? [1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety for extern params
On Sun, Oct 29, 2017 at 9:55 PM, Robert Haas wrote: > On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila wrote: >> I think we need to make changes in exec_simple_recheck_plan to make >> the behavior similar to HEAD. With the attached patch, all tests >> passed with force_parallel_mode. > > Committed to REL_10_STABLE. > Thanks, I have closed this entry in CF app. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel worker error
On Sat, Sep 9, 2017 at 7:56 AM, Amit Kapila wrote: > On Fri, Sep 8, 2017 at 3:13 PM, Robert Haas wrote: >> On Fri, Sep 8, 2017 at 1:17 AM, Amit Kapila wrote: >>> You are right. I have changed the ordering and passed OuterUserId via >>> FixedParallelState. >> >> This looks a little strange: >> >> +SetCurrentRoleId(fps->outer_user_id, fps->is_current_user_superuser); >> >> The first argument says "outer" but the second says "current". I'm >> wondering if we can just make the second one is_superuser. >> > > No issues changed as per suggestion. > >> I'm also wondering if, rather than using GetConfigOptionByName, we >> should just make the GUC underlying is_superuser non-static and use >> the value directly. If not, then I'm alternatively wondering whether >> we should maybe use a less-generic name than varval. >> > > I think we can go either way. So prepared patches with both > approaches. In fix_role_handling_parallel_worker_v3_1.patch, I have > changed the variable name and in > fix_role_handling_parallel_worker_v3_2.patch, I have exposed the guc) > is_superuser. > This patch no longer applies, so attached rebased patches. I have also created patches for v10 as we might want to backpatch the fix. Added the patch in CF (https://commitfest.postgresql.org/15/1342/) -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_role_handling_parallel_worker_v4_1.patch Description: Binary data fix_role_handling_parallel_worker_v4_2.patch Description: Binary data fix_role_handling_parallel_worker_10_v4_1.patch Description: Binary data fix_role_handling_parallel_worker_10_v4_2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety for extern params
On Sat, Oct 28, 2017 at 2:02 AM, Robert Haas wrote: > On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila wrote: > When I tried back-porting the patch to v10 I discovered that the > plpgsql changes conflict heavily and that ripping them out all the way > produces regression failures under force_parallel_mode. I think I see > why that's happening but it's not obvious how to me how to adapt > b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code. Do you want > to have a crack at it or should we just leave this as a master-only > fix? > I think we need to make changes in exec_simple_recheck_plan to make the behavior similar to HEAD. With the attached patch, all tests passed with force_parallel_mode. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_parallel_safety_for_extern_params_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Burst in WAL size when UUID is used as PK while full_page_writes are enabled
On Fri, Oct 27, 2017 at 5:46 PM, Thomas Kellerer wrote: > akapila wrote: > >> You might want to give a try with the hash index if you are planning >> to use PG10 and your queries involve equality operations. > > But you can't replace the PK index with a hash index, because hash indexes > don't support uniqueness. > That's true, but it hasn't been mentioned in the mail that the usage of hash index is the for primary key. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Burst in WAL size when UUID is used as PK while full_page_writes are enabled
On Fri, Oct 27, 2017 at 5:36 PM, Alvaro Herrera wrote: > Amit Kapila wrote: > >> You might want to give a try with the hash index if you are planning >> to use PG10 and your queries involve equality operations. > > So, btree indexes on monotonically increasing sequences don't write tons > of full page writes because typically the same page is touched many > times by insertions on each checkpoint cycle; so only one or very few > full page writes are generated for a limited number of index pages. > > With UUID you lose locality of access: each insert goes to a different > btree page, so you generate tons of full page writes because the number > of modified index pages is very large. > > With hash on monotonically increasing keys, my guess is that you get > behavior similar to btrees on UUID: the inserts are all over the place > in the index, so tons of full page writes. Am I wrong? > > With hash on UUID, the same thing should happen. Am I wrong? > If the bucket pages are decided merely based on hashkey, then what you are saying should be right. However, we mask the hash key with high|low mask due to which it falls in one of existing page in the hash index. Also, I have suggested based on some of the tests we have done on UUID column and the result was that most of the time hash index size was lesser than btree size. See pages 15-17 of hash index presentation in the last PGCon [1]. [1] - https://www.pgcon.org/2017/schedule/attachments/458_durable-hash-indexes-postgresql.pdf -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Tue, Sep 19, 2017 at 8:47 AM, Amit Kapila wrote: > On Mon, Sep 18, 2017 at 10:00 PM, Tom Lane wrote: >> Amit Kapila writes: >>> Attached patch fixes these problems. >> >> Hmm, this patch adds a kill(notify_pid) after one call to >> ForgetBackgroundWorker, but the postmaster has several more such calls. >> Shouldn't they all notify the notify_pid? Should we move that >> functionality into ForgetBackgroundWorker itself, so we can't forget >> it again? >> > > Among other places, we already notify during > ReportBackgroundWorkerExit(). However, it seems to me that all other > places except where this patch has added a call to notify doesn't need > such a call. The other places like in DetermineSleepTime and > ResetBackgroundWorkerCrashTimes is called for a crashed worker which I > don't think requires notification to the backend as that backend > itself would have restarted. The other place where we call > ForgetBackgroundWorker is in maybe_start_bgworkers when rw_terminate > is set to true which again seems to be either the case of worker crash > or when someone has explicitly asked to terminate the worker in which > case we already send a notification. > > I think we need to notify the backend on start, stop and failure to > start a worker. OTOH, if it is harmless to send a notification even > after the worker is crashed, then we can just move that functionality > into ForgetBackgroundWorker itself as that will simplify the code and > infact that is the first thing that occurred to me as well, but I > haven't done that way as I was not sure if we want to send > notification in all kind of cases. > The patch still applies (with some hunks). I have added it in CF [1] to avoid losing track. [1] - https://commitfest.postgresql.org/15/1341/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Burst in WAL size when UUID is used as PK while full_page_writes are enabled
On Fri, Oct 27, 2017 at 11:26 AM, sanyam jain wrote: > Hi, > > I was reading the blog > https://blog.2ndquadrant.com/on-the-impact-of-full-page-writes . > > My queries: > > How randomness of UUID will likely to create new leaf page in btree index? > In my understanding as the size of UUID is 128 bits i.e. twice of BIGSERIAL > , more number of pages will be required to store the same number of rows and > hence there can be increase in WAL size due to FPW . > When compared the index size in local setup UUID index is ~2x greater in > size. > You might want to give a try with the hash index if you are planning to use PG10 and your queries involve equality operations. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety for extern params
On Mon, Oct 16, 2017 at 4:29 PM, Amit Kapila wrote: > On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas wrote: > >> I think the bug is in ExecGather(Merge): it assumes that if we're in >> parallel mode, it's OK to start workers. But actually, it shouldn't >> do this unless ExecutePlan ended up with use_parallel_mode == true, >> which isn't quite the same thing. I think we might need ExecutePlan >> to set a flag in the estate that ExecGather(Merge) can check. >> > > Attached patch fixes the problem in a suggested way. > > All the patches still apply and pass the regression test. I have added this to CF to avoid losing the track of this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CurTransactionContext freed before transaction COMMIT ???
On Wed, Oct 25, 2017 at 8:04 PM, Gaddam Sai Ram wrote: > Thanks for the response, > > Can you check if CurTransactionContext is valid at that point? > > > Any Postgres function to check if CurTransactionContext is valid or not? > > To see, if this problem is related to CurTransactionContext, can you try to > populate the list in TopMemoryContext and see if that works. > > > Did you mean TopTransactionContext? > No, I mean what I have written. I suspect in your case TopTransactionContext will be same as CurTransactionContext because you don't have any subtransaction. > As of now, we don't free our dlist. We solely depend on Postgres to free our > dlist while it frees the TopTransactionContext. But if we do allocate in > TopMemoryContext, we need to take care of freeing our allocations. > Can't we do it temporarily to test? I am not suggesting to make this a permanent change rather a way to see the reason of the problem. > And one more issue is, we found this bug once in all the testing we did. So > trying to replay this bug seems very difficult. > Oh, then it is tricky. I think there is a good chance that this is some of your application issues where you probably haven't used memory context as required, so probably you need to figure out a way to reproduce this issue, otherwise, it might be difficult to track down the actual cause. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CurTransactionContext freed before transaction COMMIT ???
On Tue, Oct 24, 2017 at 7:42 PM, Gaddam Sai Ram wrote: > > Hello people, > > We are implementing in-memory index. As a part of that, during > index callbacks, under CurTransactionContext, we cache all the DMLs of a > transaction in dlist(postgres's doubly linked list). > We registered transaction callback, and in transaction > pre-commit event(XACT_EVENT_PRE_COMMIT), we iterate through all cached > DMLs(dlist) and populate in my in-memory data structure. > >--> For detailed explanation of our implementation, please look > into below thread. >https://www.postgresql.org/message-id/15f4ea99b34. > e69a4e1b633.8937474039794097334%40zohocorp.com > >--> It was working fine until I was greeted with a segmentation > fault while accessing dlist! > >--> On further examining I found that dlist head is not NULL > and it is pointing to some address, but the container I extracted is > pointing to 0x7f7f7f7f7f7f7f5f, and I was not able to access any members in > my container. > >--> https://wiki.postgresql.org/wiki/Developer_FAQ#Why_ > are_my_variables_full_of_0x7f_bytes.3F > From the above wiki, we came to a conclusion that the memory > context in which that dlist was present was freed. > And yes CLOBBER_FREED_MEMORY is enabled by passing > --enable-cassert to enable asserts in my code. > >--> Normally the memory context inside XACT_EVENT_PRE_COMMIT is > TopTransactionContext but in this particular case, the memory context was > 'MessageContext'. Little unusual! Not sure why! > >--> So basically CurTransactionContext shouldn't get freed > before transaction COMMIT. >--> So what has actually happened here??? Kindly help us with > your insights! > > Can you check if CurTransactionContext is valid at that point? To see, if this problem is related to CurTransactionContext, can you try to populate the list in TopMemoryContext and see if that works. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Pluggable storage
On Wed, Oct 25, 2017 at 11:37 AM, Robert Haas wrote: > On Fri, Oct 20, 2017 at 5:47 AM, Amit Kapila wrote: >> I think what we need here is a way to register satisfies function >> (SnapshotSatisfiesFunc) in SnapshotData for different storage engines. > > I don't see how that helps very much. SnapshotSatisfiesFunc takes a > HeapTuple as an argument, and it cares in detail about that tuple's > xmin, xmax, and infomask, and it sets hint bits. All of that is bad, > because an alternative storage engine is likely to use a different > format than HeapTuple and to not have hint bits (or at least not in > the same form we have them now). Also, it doesn't necessarily have a > Boolean answer to the question "can this snapshot see this tuple?". > It may be more like "given this TID, what tuple if any can I see > there?" or "given this tuple, what version of it would I see with this > snapshot?". > > Another thing to consider is that, if we could replace satisfiesfunc, > it would probably break some existing code. There are multiple places > in the code that compare snapshot->satisfies to > HeapTupleSatisfiesHistoricMVCC and HeapTupleSatisfiesMVCC. > > I think the storage API should just leave snapshots alone. If a > storage engine wants to call HeapTupleSatisfiesVisibility() with that > snapshot, it can do so. Otherwise it can switch on > snapshot->satisfies and handle each case however it likes. > How will it switch satisfies at runtime? There are places where we might know which visibility function (*MVCC , *Dirty, etc) needs to be called, but I think there are other places (like heap_fetch) where it is not clear and we decide based on what is stored in snapshot->satisfies. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Tue, Oct 17, 2017 at 7:53 AM, Michael Paquier wrote: > On Mon, Oct 16, 2017 at 9:50 PM, Amit Kapila wrote: >> If above analysis is correct, then I think we can say that row state >> in a page will be same during recovery as it was when the original >> operation was performed if the full_page_writes are enabled. I am not >> sure how much this can help in current heap format, but this can help >> in zheap (undo based heap). > > If I understood that correctly, that looks like a sane assumption. For > REGBUF_NO_IMAGE you may need to be careful though with undo > operations. > Right, but as of now, we don't need to use this assumption with REGBUF_NO_IMAGE. >> In zheap, we are writing complete tuple for Delete operation in undo >> so that we can reclaim the corresponding tuple space as soon as the >> deleting transaction is committed. Now, during recovery, we have to >> generate the complete undo record (which includes the entire tuple) >> and for that ideally, we should write the complete tuple in WAL, but >> instead of that, I think we can regenerate it from the original page. >> This is only applicable when full_page_writes are enabled, otherwise, >> a complete tuple is required in WAL. > > Yeah, you should really try to support both modes as well. > I also think so. > Fortunately, it is possible to know if full page writes are enforced > at the moment a record is assembled and inserted, so you could rely on > that. > Yeah, but actually we need to know whether full page writes are enforced while forming the record (something like in log_heap_update). Now, ideally to read the flags XLogCtlInsert-> fullPageWrites or XLogCtlInsert->forcePageWrites, we need an insertion lock as we do in XLogInsertRecord. However, I think we don't need an insertion lock to read the values for this purpose, rather we can use GetFullPageWriteInfo which doesn't need a lock. The reason is that if the value of doPageWrites is true while forming and assembling the WAL records, then we will include the copy of page even if the value changes in XLogInsertRecord. OTOH, if it is false while forming and assembling the WAL records, then we would have to anyway include undo tuple in the WAL record which will avoid the dependency on full_page_image, so even if doPageWrites changes to true in XLogInsertRecord, we don't need to worry. >> I am not sure how much above makes sense to anyone without a detailed >> explanation, but I thought I should give some background on why I >> asked this question. However, if anybody needs more explanation or >> sees any fault in above understanding, please let me know. > > Thanks for clarifying, I was wondering the reason behind the question > as well. It is the second time that I see the word zheap on -hackers, > and the first time was no longer than 2 days ago by Robert. > This is a big undertaking and will take time to reach a stage where the whole project can be shared, but some of the important design points which are quite linked with existing technology can be discussed earlier to avoid making wrong assumptions. Thanks for having a discussion on this topic. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Thu, Oct 19, 2017 at 1:16 AM, Robert Haas wrote: > On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund wrote: > >>b) Use shm_mq_sendv in tqueue.c by batching up insertions into the >> queue whenever it's not empty when a tuple is ready. > > Batching them with what? I hate to postpone sending tuples we've got; > that sounds nice in the case where we're sending tons of tuples at > high speed, but there might be other cases where it makes the leader > wait. > I think Rafia's latest patch on the thread [1] has done something similar where the tuples are sent till there is a space in shared memory queue and then turn to batching the tuples using local queues. [1] - https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund wrote: > Hi Everyone, > > On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote: >> While analysing the performance of TPC-H queries for the newly developed >> parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed >> that the time taken by gather node is significant. On investigation, as per >> the current method it copies each tuple to the shared queue and notifies >> the receiver. Since, this copying is done in shared queue, a lot of locking >> and latching overhead is there. >> >> So, in this POC patch I tried to copy all the tuples in a local queue thus >> avoiding all the locks and latches. Once, the local queue is filled as per >> it's capacity, tuples are transferred to the shared queue. Once, all the >> tuples are transferred the receiver is sent the notification about the same. >> >> With this patch I could see significant improvement in performance for >> simple queries, > > I've spent some time looking into this, and I'm not quite convinced this > is the right approach. > As per my understanding, the patch in this thread is dead (not required) after the patch posted by Rafia in thread "Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE" [1]. There seem to be two related problems in this area, first is shm queue communication overhead and second is workers started to stall when shm queue gets full. We can observe the first problem in simple queries that use gather and second in gather merge kind of scenarios. We are trying to resolve both the problems with the patch posted in another thread. I think there is some similarity with the patch posted on this thread, but it is different. I have mentioned something similar up thread as well. > Let me start by describing where I see the > current performance problems around gather stemming from. > > The observations here are made using > select * from t where i < 3000 offset 2999 - 1; > with Rafia's data. That avoids slowdowns on the leader due to too many > rows printed out. Sometimes I'll also use > SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1; > on tpch data to show the effects on wider tables. > > The precise query doesn't really matter, the observations here are more > general, I hope. > > 1) nodeGather.c re-projects every row from workers. As far as I can tell >that's now always exactly the same targetlist as it's coming from the >worker. Projection capability was added in 8538a6307049590 (without >checking whether it's needed afaict), but I think it in turn often >obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7. My >measurement shows that removing the projection yields quite massive >speedups in queries like yours, which is not too surprising. > >I suspect this just needs a tlist_matches_tupdesc check + an if >around ExecProject(). And a test, right now tests pass unless >force_parallel_mode is used even if just commenting out the >projection unconditionally. > +1. I think we should something to avoid this. > >Commenting out the memory barrier - which is NOT CORRECT - improves >timing: >before: 4675.626 >after: 4125.587 > >The correct fix obviously is not to break latch correctness. I think >the big problem here is that we perform a SetLatch() for every read >from the latch. > >I think we should >a) introduce a batch variant for receiving like: > > extern shm_mq_result shm_mq_receivev(shm_mq_handle *mqh, > shm_mq_iovec *iov, int *iovcnt, > bool nowait) > > which then only does the SetLatch() at the end. And maybe if it > wrapped around. > >b) Use shm_mq_sendv in tqueue.c by batching up insertions into the > queue whenever it's not empty when a tuple is ready. > I think the patch posted in another thread has tried to achieve such a batching by using local queues, maybe we should try some other way. > > Unfortunately the patch's "local worker queue" concept seems, to me, > like it's not quite addressing the structural issues, instead opting to > address them by adding another layer of queuing. > That is done to use batching the tuples while sending them. Sure, we can do some of the other things as well, but I think the main advantage is from batching the tuples in a smart way while sending them and once that is done, we might not need many of the other optimizations. [1] - https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Sat, Oct 14, 2017 at 1:09 AM, Alexander Korotkov wrote: > On Fri, Oct 13, 2017 at 9:41 PM, Robert Haas wrote: >> >> On Fri, Oct 13, 2017 at 1:59 PM, Peter Geoghegan wrote: >> >> Fully agreed. >> > >> > If we implement that interface, where does that leave EvalPlanQual()? > > > From the first glance, it seems that pluggable storage should override > EvalPlanQualFetch(), rest of EvalPlanQual() looks quite generic. > I think there is more to it. Currently, EState->es_epqTuple is a HeapTuple which is filled as part of EvalPlanQual mechanism and then later used during the scan. We need to make it pluggable in some way so that other heaps can work. We also need some work for EvalPlanQualFetchRowMarks as that also seems to be tightly coupled with HeapTuple. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Fri, Oct 13, 2017 at 1:58 PM, Haribabu Kommi wrote: > > On Fri, Oct 13, 2017 at 11:55 AM, Robert Haas wrote: >> >> On Thu, Oct 12, 2017 at 8:00 PM, Haribabu Kommi >> wrote: >> > Currently I added a snapshot_satisfies API to find out whether the tuple >> > satisfies the visibility or not with different types of visibility >> > routines. >> > I feel these >> > are some how enough to develop a different storage methods like UNDO. >> > The storage methods can decide internally how to provide the visibility. >> > >> > + amroutine->snapshot_satisfies[MVCC_VISIBILITY] = >> > HeapTupleSatisfiesMVCC; >> > + amroutine->snapshot_satisfies[SELF_VISIBILITY] = >> > HeapTupleSatisfiesSelf; >> > + amroutine->snapshot_satisfies[ANY_VISIBILITY] = HeapTupleSatisfiesAny; >> > + amroutine->snapshot_satisfies[TOAST_VISIBILITY] = >> > HeapTupleSatisfiesToast; >> > + amroutine->snapshot_satisfies[DIRTY_VISIBILITY] = >> > HeapTupleSatisfiesDirty; >> > + amroutine->snapshot_satisfies[HISTORIC_MVCC_VISIBILITY] = >> > HeapTupleSatisfiesHistoricMVCC; >> > + amroutine->snapshot_satisfies[NON_VACUUMABLE_VISIBILTY] = >> > HeapTupleSatisfiesNonVacuumable; >> > + >> > + amroutine->snapshot_satisfiesUpdate = HeapTupleSatisfiesUpdate; >> > + amroutine->snapshot_satisfiesVacuum = HeapTupleSatisfiesVacuum; >> > >> > Currently no changes are carried out in snapshot logic as that is kept >> > seperate >> > from storage API. >> >> That seems like a strange choice of API. I think it should more >> integrated with the scan logic. For example, if I'm doing an index >> scan, and I get a TID, then I should be able to just say "here's a >> TID, give me any tuples associated with that TID that are visible to >> the scan snapshot". Then for the current heap it will do >> heap_hot_search_buffer, and for zheap it will walk the undo chain and >> return the relevant tuple from the chain. > > > OK, Understood. > I will check along these lines and come up with storage API's. > I think what we need here is a way to register satisfies function (SnapshotSatisfiesFunc) in SnapshotData for different storage engines. That is the core API to decide visibility with respect to different storage engines. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Fri, Oct 13, 2017 at 11:57 AM, Amit Kapila wrote: > On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier > wrote: >> On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila >> wrote: >>> Today, I was trying to think about cases when we can return BLK_DONE >>> in XLogReadBufferForRedoExtended. One thing that occurred to me is >>> that it can happen during the replay of WAL if the full_page_writes is >>> off. Basically, when the full_page_writes is on, then during crash >>> recovery, it will always first restore the full page image of page and >>> then apply the WAL corresponding to that page, so we will never hit >>> the case where the lsn of the page is greater than lsn of WAL record. >>> >>> Are there other cases for which we can get BLK_DONE state? Is it >>> mentioned somewhere in code/comments which I am missing? >> >> Remember the thread about meta page optimization... Some index AMs use >> XLogInitBufferForRedo() to redo their meta pages and those don't have >> a FPW so when doing crash recovery you may finish by not replaying a >> meta page if its LSN on the page header is newer than what's being >> replayed. >> > > I think for metapage usage, it will anyway apply the changes. See > _bt_restore_page. > >> That's another case where BLK_DONE can be reached, even if >> full_page_writes is on. >> > > Yeah and probably if someone uses REGBUF_NO_IMAGE. However, I was > mainly thinking about cases where caller is not doing anything to > prevent full_page_image explicitly. > > If above analysis is correct, then I think we can say that row state in a page will be same during recovery as it was when the original operation was performed if the full_page_writes are enabled. I am not sure how much this can help in current heap format, but this can help in zheap (undo based heap). In zheap, we are writing complete tuple for Delete operation in undo so that we can reclaim the corresponding tuple space as soon as the deleting transaction is committed. Now, during recovery, we have to generate the complete undo record (which includes the entire tuple) and for that ideally, we should write the complete tuple in WAL, but instead of that, I think we can regenerate it from the original page. This is only applicable when full_page_writes are enabled, otherwise, a complete tuple is required in WAL. I am not sure how much above makes sense to anyone without a detailed explanation, but I thought I should give some background on why I asked this question. However, if anybody needs more explanation or sees any fault in above understanding, please let me know. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
On Sat, Oct 14, 2017 at 11:44 PM, Tomas Vondra wrote: > Hi, > > > I propose is to add a new cursor option (PARALLEL), which would allow > parallel plans for that particular user-defined cursor. Attached is an > experimental patch doing this (I'm sure there are some loose ends). > How will this work for backward scans? > > Regarding (2), if the user suspends the cursor for a long time, bummer. > The parallel workers will remain assigned, doing nothing. I don't have > any idea how to get around that, but I don't see how we could do better. > One idea could be that whenever someone uses Parallel option, we can fetch and store all the data locally before returning control to the user (something like WITH HOLD option). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety for extern params
On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas wrote: > On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila wrote: >> After fixing this problem, when I ran the regression tests with >> force_parallel_mode = regress, I saw multiple other failures. All the >> failures boil down to two kinds of cases: >> >> 1. There was an assumption while extracting simple expressions that >> the target list of gather node can contain constants or Var's. Now, >> once the above patch allows extern params as parallel-safe, that >> assumption no longer holds true. We need to handle params as well. >> Attached patch fix_simple_expr_interaction_gather_v1.patch handles >> that case. > > - * referencing the child node's output ... but setrefs.c might also have > - * copied a Const as-is. > + * referencing the child node's output or a Param... but setrefs.c might > + * also have copied a Const as-is. > > I think the Param case should be mentioned after "... but" not before > - i.e. referencing the child node's output... but setrefs.c might also > have copied a Const or Param is-is. > I am not sure if we can write the comment like that (.. copied a Const or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a special handling for Var and Param where constants are copied as-is via expression_tree_mutator. Also as proposed, the handling for params is more like Var in exec_save_simple_expr. >> 2. We don't allow execution to use parallelism if the plan can be >> executed multiple times. This has been enforced in ExecutePlan, but >> it seems like that we miss to handle the case where we are already in >> parallel mode by the time we enforce that condition. So, what >> happens, as a result, is that it will allow to use parallelism when it >> shouldn't (when the same plan is executed multiple times) and lead to >> a crash. One way to fix is that we temporarily exit the parallel mode >> in such cases and reenter in the same state once the current execution >> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch >> fixes this problem. > > This seems completely unsafe. If somebody's already entered parallel > mode, they are counting on having the parallel-mode restrictions > enforced until they exit parallel mode. We can't just disable those > restrictions for a while in the middle and then put them back. > Right. > I think the bug is in ExecGather(Merge): it assumes that if we're in > parallel mode, it's OK to start workers. But actually, it shouldn't > do this unless ExecutePlan ended up with use_parallel_mode == true, > which isn't quite the same thing. I think we might need ExecutePlan > to set a flag in the estate that ExecGather(Merge) can check. > Attached patch fixes the problem in a suggested way. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_parallel_mode_nested_execution_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
On Fri, Oct 13, 2017 at 10:32 AM, Michael Paquier wrote: > On Thu, Oct 12, 2017 at 10:47 PM, Amit Kapila wrote: >> Today, I was trying to think about cases when we can return BLK_DONE >> in XLogReadBufferForRedoExtended. One thing that occurred to me is >> that it can happen during the replay of WAL if the full_page_writes is >> off. Basically, when the full_page_writes is on, then during crash >> recovery, it will always first restore the full page image of page and >> then apply the WAL corresponding to that page, so we will never hit >> the case where the lsn of the page is greater than lsn of WAL record. >> >> Are there other cases for which we can get BLK_DONE state? Is it >> mentioned somewhere in code/comments which I am missing? > > Remember the thread about meta page optimization... Some index AMs use > XLogInitBufferForRedo() to redo their meta pages and those don't have > a FPW so when doing crash recovery you may finish by not replaying a > meta page if its LSN on the page header is newer than what's being > replayed. > I think for metapage usage, it will anyway apply the changes. See _bt_restore_page. > That's another case where BLK_DONE can be reached, even if > full_page_writes is on. > Yeah and probably if someone uses REGBUF_NO_IMAGE. However, I was mainly thinking about cases where caller is not doing anything to prevent full_page_image explicitly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Wed, Oct 11, 2017 at 9:24 PM, Robert Haas wrote: > On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila wrote: >> How about always returning false for PARAM_EXTERN? > > Yeah, I think that's what we should do. Let's do that first as a > separate patch, which we might even want to back-patch, then return to > this. > Sure, I have started a new thread as this fix lead to some other failures. I hope that is okay. https://www.postgresql.org/message-id/CAA4eK1%2B_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel safety for extern params
As discussed in a nearby thread [1] (parallelize queries containing initplans), it appears that there are cases where queries referring PARAM_EXTERN params are treated as parallel-restricted even though they should be parallel-safe. I have done some further investigation and found that actually for most of the cases they are treated as parallel-restricted (due to tests in max_parallel_hazard_walker). The cases where such queries are treated as parallel-safe are when eval_const_expressions_mutator converts the param to const. So whenever we select the generic plan, it won't work. One simple example is: create table t1(c1 int, c2 char(500)); insert into t1 values(generate_series(1,1),'aaa'); analyze t1; set force_parallel_mode = on; regression=# prepare t1_select(int) as Select c1 from t1 where c1 < $1; PREPARE regression=# explain (costs off, analyze) execute t1_select(1); QUERY PLAN --- Gather (actual time=35.252..44.727 rows= loops=1) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Seq Scan on t1 (actual time=0.364..5.638 rows= loops=1) Filter: (c1 < 1) Rows Removed by Filter: 1 Planning time: 2135.514 ms Execution time: 52.886 ms (9 rows) The next four executions will give the same plan, however, the sixth execution will give below plan: regression=# explain (costs off, analyze) execute t1_select(10005); QUERY PLAN -- Seq Scan on t1 (actual time=0.049..6.188 rows=1 loops=1) Filter: (c1 < $1) Planning time: 22577.404 ms Execution time: 7.612 ms (4 rows) Attached patch fix_parallel_safety_for_extern_params_v1.patch fixes this problem. Note, for now, I have added a very simple test in regression tests to cover prepared statement case, but we might want to add something related to generic plan selection as I have shown above. I am just not sure if it is a good idea to have multiple executions just to get the generic plan. After fixing this problem, when I ran the regression tests with force_parallel_mode = regress, I saw multiple other failures. All the failures boil down to two kinds of cases: 1. There was an assumption while extracting simple expressions that the target list of gather node can contain constants or Var's. Now, once the above patch allows extern params as parallel-safe, that assumption no longer holds true. We need to handle params as well. Attached patch fix_simple_expr_interaction_gather_v1.patch handles that case. 2. We don't allow execution to use parallelism if the plan can be executed multiple times. This has been enforced in ExecutePlan, but it seems like that we miss to handle the case where we are already in parallel mode by the time we enforce that condition. So, what happens, as a result, is that it will allow to use parallelism when it shouldn't (when the same plan is executed multiple times) and lead to a crash. One way to fix is that we temporarily exit the parallel mode in such cases and reenter in the same state once the current execution is over. Attached patch fix_parallel_mode_nested_execution_v1.patch fixes this problem. Thanks to Kuntal who has helped in investigating the regression failures which happened as a result of making param extern params as parallel-safe. [1] - https://www.postgresql.org/message-id/CA%2BTgmobSN66o2_c76rY12JvS_sZa17zpKvpuyG-QzRvVLYE8Vg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_parallel_safety_for_extern_params_v1.patch Description: Binary data fix_simple_expr_interaction_gather_v1.patch Description: Binary data fix_parallel_mode_nested_execution_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended
Today, I was trying to think about cases when we can return BLK_DONE in XLogReadBufferForRedoExtended. One thing that occurred to me is that it can happen during the replay of WAL if the full_page_writes is off. Basically, when the full_page_writes is on, then during crash recovery, it will always first restore the full page image of page and then apply the WAL corresponding to that page, so we will never hit the case where the lsn of the page is greater than lsn of WAL record. Are there other cases for which we can get BLK_DONE state? Is it mentioned somewhere in code/comments which I am missing? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar wrote: > On 6 October 2017 at 08:49, Amit Kapila wrote: >> >> Okay, but why not cheapest partial path? > > I gave some thought on this point. Overall I feel it does not matter > which partial path it should pick up. RIght now the partial paths are > not ordered. But for non-partial paths sake, we are just choosing the > very last path. So in case of mixed paths, leader will get a partial > path, but that partial path would not be the cheapest path. But if we > also order the partial paths, the same logic would then pick up > cheapest partial path. The question is, should we also order the > partial paths for the leader ? > > The only scenario I see where leader choosing cheapest partial path > *might* show some benefit, is if there are some partial paths that > need to do some startup work using only one worker. I think currently, > parallel hash join is one case where it builds the hash table, but I > guess here also, we support parallel hash build, but not sure about > the status. > You also need to consider how merge join is currently work in parallel (each worker need to perform the whole of work of right side). I think there could be more scenario's where the startup cost is high and parallel worker needs to do that work independently. For such plan, if leader starts it, it would be slow, and > no other worker would be able to help it, so its actual startup cost > would be drastically increased. (Another path is parallel bitmap heap > scan where the leader has to do something and the other workers wait. > But here, I think it's not much work for the leader to do). So > overall, to handle such cases, it's better for leader to choose a > cheapest path, or may be, a path with cheapest startup cost. We can > also consider sorting partial paths with decreasing startup cost. > Yeah, that sounds reasonable. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Sat, Oct 7, 2017 at 7:22 PM, Robert Haas wrote: > On Fri, Oct 6, 2017 at 7:08 AM, Amit Kapila wrote: >> I have fixed the other review comment related to using safe_param_list >> in the attached patch. I think I have fixed all comments given by >> you, but let me know if I have missed anything or you have any other >> comment. > > -Param *param = (Param *) node; > +if (list_member_int(context->safe_param_ids, ((Param *) > node)->paramid)) > +return false; > > -if (param->paramkind != PARAM_EXEC || > -!list_member_int(context->safe_param_ids, param->paramid)) > -{ > -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > -return true; > -} > -return false;/* nothing to recurse to */ > +if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > +return true; > > I think your revised logic is wrong here because it drops the > paramkind test, and I don't see any justification for that. > I have dropped the check thinking that Param extern params will be always safe and for param exec params we now have a list of safe params, so we don't need this check and now again thinking about it, it seems I might not be right. OTOH, I am not sure if the check as written is correct for all cases, however, I think for the purpose of this patch, we can leave it as it is and discuss separately what should be the check (as suggested by you). I have reverted the check in the attached patch. > > But I'm also wondering if we're missing a trick here, because isn't a > PARAM_EXTERN parameter always safe, given SerializeParamList? > Right. > If so, > this || ought to be &&, but if that adjustment is needed, it seems > like a separate patch. > How will it work if, during planning, we encounter param_extern param in any list? Won't it return true in that case, because param extern params will not be present in safe_param_ids, so this check will be true and then max_parallel_hazard_test will also return true? How about always returning false for PARAM_EXTERN? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_initplan_v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Fri, Oct 6, 2017 at 2:32 AM, Robert Haas wrote: > On Thu, Oct 5, 2017 at 1:16 PM, Amit Kapila wrote: >> On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas wrote: >>> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila wrote: >>>> Now, unless, I am missing something here, it won't be possible to >>>> detect params in such cases during forming of join rels and hence we >>>> need the tests in generate_gather_paths. Let me know if I am missing >>>> something in this context or if you have any better ideas to make it >>>> work? >>> >>> Hmm, in a case like this, I think we shouldn't need to detect it. The >>> Var is perfectly parallel-safe, the problem is that we can't use a >>> not-safe plan for the inner rel. I wonder why that's happening >>> here... >> >> It is because the inner rel (Result Path) contains a reference to a >> param which appears to be at the same query level. Basically due to >> changes in max_parallel_hazard_walker(). > > I spent several hours debugging this issue this afternoon. > Thanks a lot for spending time. > I think > you've misdiagnosed the problem. I think that the Param reference in > the result path is parallel-safe; that doesn't seem to me to be wrong. > If we see a Param reference for our own query level, then either we're > below the Gather and the new logic added by this patch will pass down > the value or we're above the Gather and we can access the value > directly. Either way, no problem. > > However, I think that if you attach an InitPlan to a parallel-safe > plan, it becomes parallel-restricted. This is obvious in the case > where the InitPlan's plan isn't itself parallel-safe, but it's also > arguably true even when the InitPlan's plan *is* parallel-safe, > because pushing that below a Gather introduces a multiple-evaluation > hazard. I think we should fix that problem someday by providing a > DSA-based parameter store, but that's a job for another day. > > And it turns out that we actually have such logic already, but this > patch removes it: > > @@ -2182,7 +2181,6 @@ SS_charge_for_initplans(PlannerInfo *root, > RelOptInfo *final_rel) > > path->startup_cost += initplan_cost; > path->total_cost += initplan_cost; > - path->parallel_safe = false; > } > > /* We needn't do set_cheapest() here, caller will do it */ > Yeah, it is a mistake from my side. I thought that we don't need it now as we pass the computed value of initplan params for valid cases and for other cases we can prohibit them as was done in the patch. > Now, the header comment for SS_charge_for_initplans() is wrong; it > claims we can't transmit initPlans to workers, but I think that's > already wrong even before this patch. > What exactly you want to convey as part of that comment? I have fixed the other review comment related to using safe_param_list in the attached patch. I think I have fixed all comments given by you, but let me know if I have missed anything or you have any other comment. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_initplan_v11.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar wrote: > > Ok. How about removing pa_all_partial_subpaths altogether , and > instead of the below condition : > > /* > * If all the child rels have partial paths, and if the above Parallel > * Append path has a mix of partial and non-partial subpaths, then consider > * another Parallel Append path which will have *all* partial subpaths. > * If enable_parallelappend is off, make this one non-parallel-aware. > */ > if (partial_subpaths_valid && !pa_all_partial_subpaths) > .. > > Use this condition : > if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL) > .. > Sounds good to me. One minor point: + if (!node->as_padesc) + { + /* + */ + if (!exec_append_seq_next(node)) + return ExecClearTuple(node->ps.ps_ResultTupleSlot); + } It seems either you want to add a comment in above part of patch or you just left /**/ mistakenly. > > > > Regarding a mix of partial and non-partial paths, I feel it always > makes sense for the leader to choose the partial path. > Okay, but why not cheapest partial path? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas wrote: > On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila wrote: >> Now, unless, I am missing something here, it won't be possible to >> detect params in such cases during forming of join rels and hence we >> need the tests in generate_gather_paths. Let me know if I am missing >> something in this context or if you have any better ideas to make it >> work? > > Hmm, in a case like this, I think we shouldn't need to detect it. The > Var is perfectly parallel-safe, the problem is that we can't use a > not-safe plan for the inner rel. I wonder why that's happening > here... > It is because the inner rel (Result Path) contains a reference to a param which appears to be at the same query level. Basically due to changes in max_parallel_hazard_walker(). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Thu, Oct 5, 2017 at 6:14 PM, Robert Haas wrote: > On Thu, Oct 5, 2017 at 6:29 AM, Amit Kapila wrote: >> Okay, but can't we try to pick the cheapest partial path for master >> backend and maybe master backend can try to work on a partial path >> which is already picked up by some worker. > > Well, the master backend is typically going to be the first process to > arrive at the Parallel Append because it's already running, whereas > the workers have to start. > Sure, the leader can pick the cheapest partial path to start with. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Mon, Oct 2, 2017 at 6:21 PM, Robert Haas wrote: > On Sun, Oct 1, 2017 at 9:55 AM, Amit Kapila wrote: >> Isn't it for both? I mean it is about comparing the non-partial paths >> for child relations of the same relation and also when there are >> different relations involved as in Union All kind of query. In any >> case, the point I was trying to say is that generally non-partial >> relations will have relatively smaller scan size, so probably should >> take lesser time to complete. > > I don't think that's a valid inference. It's true that a relation > could fail to have a partial path because it's small, but that's only > one reason among very many. The optimal index type could be one that > doesn't support parallel index scans, for example. > Valid point. >> Sure, I think it is quite good if we can achieve that but it seems to >> me that we will not be able to achieve that in all scenario's with the >> patch and rather I think in some situations it can result in leader >> ended up picking the costly plan (in case when there are all partial >> plans or mix of partial and non-partial plans). Now, we are ignoring >> such cases based on the assumption that other workers might help to >> complete master backend. I think it is quite possible that the worker >> backends picks up some plans which emit rows greater than tuple queue >> size and they instead wait on the master backend which itself is busy >> in completing its plan. So master backend will end up taking too much >> time. If we want to go with a strategy of master (leader) backend and >> workers taking a different strategy to pick paths to work on, then it >> might be better if we should try to ensure that master backend always >> starts from the place which has cheapest plans irrespective of whether >> the path is partial or non-partial. > > I agree that it's complicated to get this right in all cases; I'm > realizing that's probably an unattainable ideal. > > However, I don't think ignoring the distinction between partial and > non-partial plans is an improvement, because the argument that other > workers may be able to help the leader is a correct one. You are > correct in saying that the workers might fill up their tuple queues > while the leader is working, but once the leader returns one tuple it > will switch to reading from the queues. Every other tuple can be > returned by some worker. On the other hand, if the leader picks a > non-partial plan, it must run that plan through to completion. > > Imagine (a) a non-partial path with a cost of 1000 returning 100 > tuples and (b) a partial path with a cost of 10,000 returning 1,000 > tuples. No matter which path the leader picks, it has about 10 units > of work to do to return 1 tuple. However, if it picks the first path, > it is committed to doing 990 more units of work later, regardless of > whether the workers have filled the tuple queues or not. If it picks > the second path, it again has about 10 units of work to do to return 1 > tuple, but it hasn't committed to doing all the rest of the work of > that path. So that's better. > > Now it's hard to get all of the cases right. If the partial path in > the previous example had a cost of 1 crore then even returning 1 tuple > is more costly than running the whole non-partial path. When you > introduce partition-wise join and parallel hash, there are even more > problems. Now the partial path might have a large startup cost, so > returning the first tuple may be very expensive, and that work may > help other workers (if this is a parallel hash) or it may not (if this > is a non-parallel hash). > Yeah, these were the type of cases I am also worried. > I don't know that we can get all of these > cases right, or that we should try. However, I still think that > picking partial paths preferentially is sensible -- we don't know all > the details, but we do know that they're typically going to at least > try to divide up the work in a fine-grained fashion, while a > non-partial path, once started, the leader must run it to completion. > Okay, but can't we try to pick the cheapest partial path for master backend and maybe master backend can try to work on a partial path which is already picked up by some worker. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Wed, Oct 4, 2017 at 12:55 PM, Amit Kapila wrote: > On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas wrote: >> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila wrote: >> >> Having said all that, I think that this patch only wants to handle the >> subset of cases (2) and (4) where the relevant InitPlan is attached >> ABOVE the Gather node -- which seems very reasonable, because >> evaluating an InitPlan at a level of the plan tree above the level at >> which it is defined sounds like it might be complex. But I still >> don't quite see why we need these tests. I mean, if we only allow >> Param references from a set of safe parameter IDs, and the safe >> parameter IDs include only those IDs that can be generated in a >> worker, then won't other InitPlans in the workers anyway be ruled out? > > It doesn't happen always. There are cases when the part of required > conditions are pushed one query level below, so when we check during > max_parallel_hazard_walker, they look safe, but actually, they are > not. I will try to explain by example: > > postgres=# explain (costs off, verbose) select * from t1 where t1.i in > ( select 1 + (select max(j) from t3)); > QUERY PLAN > -- > Hash Semi Join >Output: t1.i, t1.j, t1.k >Hash Cond: (t1.i = ((1 + $1))) >-> Seq Scan on public.t1 > Output: t1.i, t1.j, t1.k >-> Hash > Output: ((1 + $1)) > -> Result >Output: (1 + $1) >InitPlan 1 (returns $1) > -> Finalize Aggregate >Output: max(t3.j) >-> Gather > Output: (PARTIAL max(t3.j)) > Workers Planned: 2 > -> Partial Aggregate >Output: PARTIAL max(t3.j) >-> Parallel Seq Scan on public.t3 > Output: t3.j > (19 rows) > > In the above example, you can see that the condition referring to > initplan (1 + $1) is pushed one level below. So when it tries to > check parallel safety for the join condition, it won't see Param node. > To add more detail, the param node in join qual is replaced with Var during pullup of sublink. Basically, it constructs vars from target list of subquery and then replaces it in join quals. Refer code: convert_ANY_sublink_to_join() { .. /* * Build a list of Vars representing the subselect outputs. */ subquery_vars = generate_subquery_vars(root, subselect->targetList, rtindex); /* * Build the new join's qual expression, replacing Params with these Vars. */ quals = convert_testexpr(root, sublink->testexpr, subquery_vars); .. } Now, unless, I am missing something here, it won't be possible to detect params in such cases during forming of join rels and hence we need the tests in generate_gather_paths. Let me know if I am missing something in this context or if you have any better ideas to make it work? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Wed, Oct 4, 2017 at 12:55 PM, Amit Kapila wrote: > On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas wrote: >> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila wrote: >> >> Having said all that, I think that this patch only wants to handle the >> subset of cases (2) and (4) where the relevant InitPlan is attached >> ABOVE the Gather node -- which seems very reasonable, because >> evaluating an InitPlan at a level of the plan tree above the level at >> which it is defined sounds like it might be complex. But I still >> don't quite see why we need these tests. I mean, if we only allow >> Param references from a set of safe parameter IDs, and the safe >> parameter IDs include only those IDs that can be generated in a >> worker, then won't other InitPlans in the workers anyway be ruled out? > > It doesn't happen always. There are cases when the part of required > conditions are pushed one query level below, so when we check during > max_parallel_hazard_walker, they look safe, but actually, they are > not. I will try to explain by example: > > postgres=# explain (costs off, verbose) select * from t1 where t1.i in > ( select 1 + (select max(j) from t3)); If you want to reproduce this case, then you can use the script posted by Kuntal up thread [1]. [1] - https://www.postgresql.org/message-id/CAGz5QC%2BuHOq78GCika3fbgRyN5zgiDR9Dd1Th5kENF%2BUpnPomQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas wrote: > On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila wrote: > > Having said all that, I think that this patch only wants to handle the > subset of cases (2) and (4) where the relevant InitPlan is attached > ABOVE the Gather node -- which seems very reasonable, because > evaluating an InitPlan at a level of the plan tree above the level at > which it is defined sounds like it might be complex. But I still > don't quite see why we need these tests. I mean, if we only allow > Param references from a set of safe parameter IDs, and the safe > parameter IDs include only those IDs that can be generated in a > worker, then won't other InitPlans in the workers anyway be ruled out? It doesn't happen always. There are cases when the part of required conditions are pushed one query level below, so when we check during max_parallel_hazard_walker, they look safe, but actually, they are not. I will try to explain by example: postgres=# explain (costs off, verbose) select * from t1 where t1.i in ( select 1 + (select max(j) from t3)); QUERY PLAN -- Hash Semi Join Output: t1.i, t1.j, t1.k Hash Cond: (t1.i = ((1 + $1))) -> Seq Scan on public.t1 Output: t1.i, t1.j, t1.k -> Hash Output: ((1 + $1)) -> Result Output: (1 + $1) InitPlan 1 (returns $1) -> Finalize Aggregate Output: max(t3.j) -> Gather Output: (PARTIAL max(t3.j)) Workers Planned: 2 -> Partial Aggregate Output: PARTIAL max(t3.j) -> Parallel Seq Scan on public.t3 Output: t3.j (19 rows) In the above example, you can see that the condition referring to initplan (1 + $1) is pushed one level below. So when it tries to check parallel safety for the join condition, it won't see Param node. Now, consider if we don't check contains_parallel_unsafe_param during generate_gather_paths, then it will lead to below plan. postgres=# explain (costs off, verbose) select * from t1 where t1.i in ( select 1 + (select max(j) from t3)); QUERY PLAN Gather Output: t1.i, t1.j, t1.k Workers Planned: 2 -> Hash Semi Join Output: t1.i, t1.j, t1.k Hash Cond: (t1.i = ((1 + $1))) -> Parallel Seq Scan on public.t1 Output: t1.i, t1.j, t1.k -> Hash Output: ((1 + $1)) -> Result Output: (1 + $1) InitPlan 1 (returns $1) -> Finalize Aggregate Output: max(t3.j) -> Gather Output: (PARTIAL max(t3.j)) Workers Planned: 2 -> Partial Aggregate Output: PARTIAL max(t3.j) -> Parallel Seq Scan on public.t3 Output: t3.j (22 rows) This is wrong because when we will try to evaluate params that are required at gather node, we won't get the required param as there is no initplan at that level. > > If I am all mixed up, please help straighten me out. > I think whatever you said is right and very clear. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Mon, Oct 2, 2017 at 8:13 PM, Robert Haas wrote: > On Thu, Sep 14, 2017 at 10:45 PM, Amit Kapila wrote: >> The latest patch again needs to be rebased. Find rebased patch >> attached with this email. > > I read through this patch this morning. Copying Tom in the hopes > that he might chime in on the following two issues in particular: > It seems you forgot to include Tom, included now. > 1. Is there any superior alternative to adding ptype to ParamExecData? > I think the reason why we don't have this already is because the plan > tree that populates the Param must output the right type and the plan > tree that reads the Param must expect the right type, and there's no > need for anything else. But for serialization and deserialization > this seems to be necessary. I wonder whether it would be better to > try to capture this at the time the Param is generated (e.g. > var->vartype in assign_param_for_var) rather than derived at execution > time by applying exprType(), but I'm not sure how that would work > exactly, or whether it's really any better. > > 2. Do max_parallel_hazard_walker and set_param_references() have the > right idea about which parameters are acceptable candidates for this > optimization? The idea seems to be to collect the setParam sets for > all initplans between the current query level and the root. That > looks correct to me but I'm not an expert on this parameter stuff. > > Some other comments: > > - I think parallel_hazard_walker should likely work by setting > safe_param_ids to the right set of parameter IDs rather than testing > whether the parameter is safe by checking either whether it is in > safe_param_ids or some other condition is met. > Okay, I think it should work the way you are suggesting. I will give it a try. > - contains_parallel_unsafe_param() sounds like a function that merely > returns true or false, but it actually has major side effects. Those > side effects also look unsafe; mutating previously-generated paths can > corrupt the rel's pathlist, because it will no longer have the sort > order and other characteristics that add_path() creates and upon which > other code relies. > > - Can't is_initplan_is_below_current_query_level() be confused when > there are multiple subqueries in the tree? For example if the > toplevel query has subqueries a and b and a has a sub-subquery aa > which has an initplan, won't this function think that b has an > initplan below the current query level? If not, maybe a comment is in > order explaining why not? > We can discuss above two points once you are convinced about whether we need any such functions, so let's first discuss that. > - Why do we even need contains_parallel_unsafe_param() and > is_initplan_is_below_current_query_level() in the first place, anyway? > I think there's been some discussion of that on this thread, but I'm > not sure I completely understand it, and the comments in the patch > don't help me understand why we now need this restriction. > This is to ensure that initplans are never below Gather node. If we allow parallelism when initplan is below current query level, (a) then we need a way to pass the result of initplan from worker to other workers and to master backend (b) if initplan contains reference to some parameter above the current query level (undirect correlated plans), then we need a mechanism to pass such parameters (basically allow correlated initplans work). Now, one might think (a) and or (b) is desirable so that it can be used in more number of cases, but based on TPC-H analysis we have found that it is quite useful even without those and in fact after we support those cases the benefits might not be significant. The patch contains few comments in generate_gather_paths and at the top of functions contains_parallel_unsafe_param and is_initplan_is_below_current_query_level, if you feel it should be explained in further detail, then let me know. > - The new code in ExplainPrintPlan() needs a comment. > There was a comment, but I have added a somewhat detailed comment now, check if that makes it clear. > - I have typically referred in comments to "Gather or Gather Merge" > rather than "gather or gather merge". > Changed as per suggestion. Changed funcation name is_initplan_is_below_current_query_level to is_initplan_below_current_query_level as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_initplan_v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing
On Tue, Oct 3, 2017 at 8:31 AM, Amit Kapila wrote: > On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich > wrote: >> Hi, >> >> doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced >> a couple of parallel worker core dumps with the backtrace below. >> Although most of the backtrace is inside the dynamic linker, it looks >> like it was passed a pointer to gone-away shared memory. >> > > It appears to be some dangling pointer, but not sure how it is > possible. Can you provide some more details, like do you have any > other library which you want to get loaded in the backend (like by > using shared_preload_libraries or by some other way)? I think without > that we shouldn't try to load anything in the parallel worker. > Another possibility could be that the memory for library space has been overwritten either in master backend or in worker backend. I think that is possible in low-memory conditions if in someplace we try to write in the memory without ensuring if space is allocated. I have browsed the nearby code and didn't find any such instance. One idea to narrow down the problem is to see if the other members in worker backend are sane, for ex. can you try printing the value of MyFixedParallelState as we get that value from shared memory similar to libraryspace. It seems from call stack that the memory of libraryspace is corrupted, so we can move the call to lookup/RestoreLibraryState immediately after we assign MyFixedParallelState. I think if after this also the memory for libraryspace is corrupted, then probably something bad has happened in master backend. Any other ideas? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing
On Tue, Oct 3, 2017 at 3:04 AM, Andreas Seltenreich wrote: > Hi, > > doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced > a couple of parallel worker core dumps with the backtrace below. > Although most of the backtrace is inside the dynamic linker, it looks > like it was passed a pointer to gone-away shared memory. > It appears to be some dangling pointer, but not sure how it is possible. Can you provide some more details, like do you have any other library which you want to get loaded in the backend (like by using shared_preload_libraries or by some other way)? I think without that we shouldn't try to load anything in the parallel worker. Also, if you can get the failed query (check in server log), it would be great. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Sat, Sep 30, 2017 at 9:25 PM, Robert Haas wrote: > On Sat, Sep 30, 2017 at 12:20 AM, Amit Kapila wrote: >> Okay, but the point is whether it will make any difference >> practically. Let us try to see with an example, consider there are >> two children (just taking two for simplicity, we can extend it to >> many) and first having 1000 pages to scan and second having 900 pages >> to scan, then it might not make much difference which child plan >> leader chooses. Now, it might matter if the first child relation has >> 1000 pages to scan and second has just 1 page to scan, but not sure >> how much difference will it be in practice considering that is almost >> the maximum possible theoretical difference between two non-partial >> paths (if we have pages greater than 1024 pages >> (min_parallel_table_scan_size) then it will have a partial path). > > But that's comparing two non-partial paths for the same relation -- > the point here is to compare across relations. Isn't it for both? I mean it is about comparing the non-partial paths for child relations of the same relation and also when there are different relations involved as in Union All kind of query. In any case, the point I was trying to say is that generally non-partial relations will have relatively smaller scan size, so probably should take lesser time to complete. > Also keep in mind > scenarios like this: > > SELECT ... FROM relation UNION ALL SELECT ... FROM generate_series(...); > I think for the FunctionScan case, non-partial paths can be quite costly. >>> It's a lot fuzzier what is best when there are only partial plans. >>> >> >> The point that bothers me a bit is whether it is a clear win if we >> allow the leader to choose a different strategy to pick the paths or >> is this just our theoretical assumption. Basically, I think the patch >> will become simpler if pick some simple strategy to choose paths. > > Well, that's true, but is it really that much complexity? > > And I actually don't see how this is very debatable. If the only > paths that are reasonably cheap are GIN index scans, then the only > strategy is to dole them out across the processes you've got. Giving > the leader the cheapest one seems to be to be clearly smarter than any > other strategy. > Sure, I think it is quite good if we can achieve that but it seems to me that we will not be able to achieve that in all scenario's with the patch and rather I think in some situations it can result in leader ended up picking the costly plan (in case when there are all partial plans or mix of partial and non-partial plans). Now, we are ignoring such cases based on the assumption that other workers might help to complete master backend. I think it is quite possible that the worker backends picks up some plans which emit rows greater than tuple queue size and they instead wait on the master backend which itself is busy in completing its plan. So master backend will end up taking too much time. If we want to go with a strategy of master (leader) backend and workers taking a different strategy to pick paths to work on, then it might be better if we should try to ensure that master backend always starts from the place which has cheapest plans irrespective of whether the path is partial or non-partial. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar wrote: > On 16 September 2017 at 10:42, Amit Kapila wrote: >> >> At a broader level, the idea is good, but I think it won't turn out >> exactly like that considering your below paragraph which indicates >> that it is okay if leader picks a partial path that is costly among >> other partial paths as a leader won't be locked with that. >> Considering this is a good design for parallel append, the question is >> do we really need worker and leader to follow separate strategy for >> choosing next path. I think the patch will be simpler if we can come >> up with a way for the worker and leader to use the same strategy to >> pick next path to process. How about we arrange the list of paths >> such that first, all partial paths will be there and then non-partial >> paths and probably both in decreasing order of cost. Now, both leader >> and worker can start from the beginning of the list. In most cases, >> the leader will start at the first partial path and will only ever >> need to scan non-partial path if there is no other partial path left. >> This is not bulletproof as it is possible that some worker starts >> before leader in which case leader might scan non-partial path before >> all partial paths are finished, but I think we can avoid that as well >> if we are too worried about such cases. > > If there are no partial subpaths, then again the leader is likely to > take up the expensive subpath. And this scenario would not be > uncommon. > While thinking about how common the case of no partial subpaths would be, it occurred to me that as of now we always create a partial path for the inheritance child if it is parallel-safe and the user has not explicitly set the value of parallel_workers to zero (refer compute_parallel_worker). So, unless you are planning to change that, I think it will be quite uncommon to have no partial subpaths. Few nitpicks in your latest patch: 1. @@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node) if (subnode->chgParam == NULL) ExecReScan(subnode); } + Looks like a spurious line. 2. @@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, .. + if (chosen_path && chosen_path != cheapest_partial_path) + pa_all_partial_subpaths = false; It will keep on setting pa_all_partial_subpaths as false for non-partial paths which don't seem to be the purpose of this variable. I think you want it to be set even when there is one non-partial path, so isn't it better to write as below or something similar: if (pa_nonpartial_subpaths && pa_all_partial_subpaths) pa_all_partial_subpaths = false; -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Sat, Sep 30, 2017 at 4:02 AM, Robert Haas wrote: > On Fri, Sep 29, 2017 at 7:48 AM, Amit Kapila wrote: >> I think in general the non-partial paths should be cheaper as compared >> to partial paths as that is the reason planner choose not to make a >> partial plan at first place. I think the idea patch is using will help >> because the leader will choose to execute partial path in most cases >> (when there is a mix of partial and non-partial paths) and for that >> case, the leader is not bound to complete the execution of that path. >> However, if all the paths are non-partial, then I am not sure much >> difference it would be to choose one path over other. > > The case where all plans are non-partial is the case where it matters > most! If the leader is going to take a share of the work, we want it > to take the smallest share possible. > Okay, but the point is whether it will make any difference practically. Let us try to see with an example, consider there are two children (just taking two for simplicity, we can extend it to many) and first having 1000 pages to scan and second having 900 pages to scan, then it might not make much difference which child plan leader chooses. Now, it might matter if the first child relation has 1000 pages to scan and second has just 1 page to scan, but not sure how much difference will it be in practice considering that is almost the maximum possible theoretical difference between two non-partial paths (if we have pages greater than 1024 pages (min_parallel_table_scan_size) then it will have a partial path). > It's a lot fuzzier what is best when there are only partial plans. > The point that bothers me a bit is whether it is a clear win if we allow the leader to choose a different strategy to pick the paths or is this just our theoretical assumption. Basically, I think the patch will become simpler if pick some simple strategy to choose paths. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar wrote: > On 16 September 2017 at 10:42, Amit Kapila wrote: >> On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas wrote: >>> On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila >>> wrote: >>>> I think the patch stores only non-partial paths in decreasing order, >>>> what if partial paths having more costs follows those paths? >>> >>> The general picture here is that we don't want the leader to get stuck >>> inside some long-running operation because then it won't be available >>> to read tuples from the workers. On the other hand, we don't want to >>> just have the leader do no work because that might be slow. And in >>> most cast cases, the leader will be the first participant to arrive at >>> the Append node, because of the worker startup time. So the idea is >>> that the workers should pick expensive things first, and the leader >>> should pick cheap things first. >>> >> >> At a broader level, the idea is good, but I think it won't turn out >> exactly like that considering your below paragraph which indicates >> that it is okay if leader picks a partial path that is costly among >> other partial paths as a leader won't be locked with that. >> Considering this is a good design for parallel append, the question is >> do we really need worker and leader to follow separate strategy for >> choosing next path. I think the patch will be simpler if we can come >> up with a way for the worker and leader to use the same strategy to >> pick next path to process. How about we arrange the list of paths >> such that first, all partial paths will be there and then non-partial >> paths and probably both in decreasing order of cost. Now, both leader >> and worker can start from the beginning of the list. In most cases, >> the leader will start at the first partial path and will only ever >> need to scan non-partial path if there is no other partial path left. >> This is not bulletproof as it is possible that some worker starts >> before leader in which case leader might scan non-partial path before >> all partial paths are finished, but I think we can avoid that as well >> if we are too worried about such cases. > > If there are no partial subpaths, then again the leader is likely to > take up the expensive subpath. > I think in general the non-partial paths should be cheaper as compared to partial paths as that is the reason planner choose not to make a partial plan at first place. I think the idea patch is using will help because the leader will choose to execute partial path in most cases (when there is a mix of partial and non-partial paths) and for that case, the leader is not bound to complete the execution of that path. However, if all the paths are non-partial, then I am not sure much difference it would be to choose one path over other. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Mon, Sep 25, 2017 at 12:18 PM, Amit Langote wrote: > Hi. > > Trying to catch up. > > On 2017/09/25 13:43, Michael Paquier wrote: >> On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila wrote: >>> Added and updated the comments for both btree and hash index patches. >> >> I don't have real complaints about this patch, this looks fine to me. >> >> +* Currently, the advantage of setting pd_lower is in limited cases like >> +* during wal_consistency_checking or while logging for unlogged relation >> +* as for all other purposes, we initialize the metapage. Note, it also >> +* helps in page masking by allowing to mask unused space. >> I would have reworked this comment a bit, say like that: >> Setting pd_lower is useful for two cases which make use of WAL >> compressibility even if the meta page is initialized at replay: >> - Logging of init forks for unlogged relations. >> - wal_consistency_checking logs extra full-page writes, and this >> allows masking of the unused space of the page. >> >> Now I often get complains that I suck at this exercise ;) > > So, I think I understand the discussion so far and the arguments about > what we should write to explain why we're setting pd_lower to the correct > value. > > Just to remind, I didn't actually start this thread [1] to address the > issue that the FPWs of meta pages written to WAL are not getting > compressed. An external backup tool relies on pd_lower to give the > correct starting offset of the hole to compress, provided the page's other > fields suggest it has the standard page layout. Since we discovered that > pd_lower is not set correctly in gin, brin, and spgist meta pages, I > created patches to do the same. You (Michael) pointed out that that > actually helps compress their FPW images in WAL as well, so we began > considering that. Also, you pointed out that WAL checking code masks > pages based on the respective masking functions' assumptions about the > page's layout properties, which the original patches forgot to consider. > So, we updated the patches to change the respective masking functions to > mask meta pages as pages with standard page layout, too. > > But as Tom pointed out [2], WAL compressibility benefit cannot be obtained > unless we change how the meta page is passed to xlog.c to be written to > WAL. So, we found out all the places that write/register the meta page to > WAL and changed the respective XLogRegisterBuffer calls to include the > REGBUG_STANDARD flag. Some of these calls already passed > REGBUF_WILL_INIT, which would result in no FPW image to be written to the > WAL and so there would be no question of compressibility. But, passing > REGBUF_STANDARD would still help the case where WAL checking is enabled, > because FPW image *is* written in that case. > > So, ISTM, comments that the patches add should all say that setting the > meta pages' pd_lower to the correct value helps to pass those pages to > xlog.c as compressible standard layout pages, regardless of whether they > are actually passed that way. Even if the patches do take care of the > latter as well. > > Did I miss something? > > Looking at Amit K's updated patches for btree and hash, it seems that he > updated the comment to say that setting pd_lower to the correct value is > *essential*, because those pages are passed as REGBUF_STANDARD pages and > hence will be compressed. That seems to contradict what I wrote above, > but I think that's not wrong, too. > I think you are missing that there are many cases where we use only REGBUF_STANDARD for meta-pages (cf. hash index). For btree, where the advantage is in fewer cases, I have explicitly stated those cases. Do you still have confusion? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Mon, Sep 25, 2017 at 10:13 AM, Michael Paquier wrote: > On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila wrote: >> Added and updated the comments for both btree and hash index patches. > > I don't have real complaints about this patch, this looks fine to me. > > +* Currently, the advantage of setting pd_lower is in limited cases like > +* during wal_consistency_checking or while logging for unlogged relation > +* as for all other purposes, we initialize the metapage. Note, it also > +* helps in page masking by allowing to mask unused space. > I would have reworked this comment a bit, say like that: > Setting pd_lower is useful for two cases which make use of WAL > compressibility even if the meta page is initialized at replay: > - Logging of init forks for unlogged relations. > - wal_consistency_checking logs extra full-page writes, and this > allows masking of the unused space of the page. > > Now I often get complains that I suck at this exercise ;) > I understand that and I think there are always multiple ways to write same information. It might be better to pass this patch series to committer if you don't see any mistake because he might anyway change some comments before committing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Fri, Sep 22, 2017 at 11:56 PM, Robert Haas wrote: > On Thu, Sep 21, 2017 at 3:08 AM, Ashutosh Sharma > wrote: >> I have added a note for handling of logged and unlogged tables in >> README file and also corrected the header comment for >> hashbucketcleanup(). Please find the attached 0003*.patch having these >> changes. Thanks. > > I committed 0001 and 0002 with some additional edits as > 7c75ef571579a3ad7a1d3ee909f11dba5e0b9440. > I have noticed a typo in that commit (in Readme) and patch for the same is attached. > I also rebased 0003 and > edited it a bit; see attached hash-cleanup-changes.patch. > > I'm not entirely sold on 0003. An alternative would be to rip the lsn > stuff back out of HashScanPosData, and I think we ought to consider > that. Basically, 0003 is betting that getting rid of the > lock-chaining in hash index vacuum is more valuable than being able to > kill dead items more aggressively. I bet that's a bad bet. > > In the case of btree indexes, since > 2ed5b87f96d473962ec5230fd820abfeaccb2069, page-at-a-time scanning > allows most btree index scans to avoid holding buffer pins when the > scan is suspended, but we gain no such advantage here. We always have > to hold a pin on the primary bucket page anyway, so even with this > patch cleanup is going to block when it hits a bucket containing a > suspended scan. 0003 helps if (a) the relation is permanent, (b) the > bucket has overflow pages, and (c) the scan is moving faster than > vacuum and can overtake it instead of waiting. But that doesn't seem > like it will happen very often at all, whereas the LSN check will > probably fail frequently and cause us to skip cleanup that we could > usefully have done. So I propose the attached hashscan-no-lsn.patch > as an alternative. > I think your proposal makes sense. Your patch looks good but you might want to tweak the comments atop _hash_kill_items ("However, having pin on the overflow page doesn't guarantee that vacuum won't delete any items.). That part of the comment has been written to indicate that we have to check LSN in this function unconditonally. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com typo_hash_readme_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Wed, Sep 20, 2017 at 9:19 AM, Amit Kapila wrote: > On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier > wrote: >> On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane wrote: >>> Amit Kapila writes: >>>> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier >>>> wrote: >>>>> I am not saying that no index AMs take advantage FPW compressibility >>>>> for their meta pages. There are cases like this one, as well as one >>>>> code path in BRIN where this is useful, and it is useful as well when >>>>> logging FPWs of the init forks for unlogged relations. >>> >>>> Hmm, why is it useful for logging FPWs of the init forks for unlogged >>>> relations? We don't use REGBUF_STANDARD in those cases. >>> >>> But if we started to do so, that would be a concrete benefit of this >>> patch ... >> >> In the proposed set of patches, all the empty() routines part of index >> AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the >> right thing by updating log_newpage_buffer(). btree also should have >> its call to log_newpage updated in btbuildempty(), and your patch is >> missing that. >> > > We can add that for btree patch. > Added and updated the comments for both btree and hash index patches. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com change_metapage_usage_btree-v2.patch Description: Binary data change_metapage_usage_hash-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Thu, Sep 21, 2017 at 4:14 PM, Kyotaro HORIGUCHI wrote: > > postgres=# create table test (id int primary key, v text); > postgres=# create index on test using hash (id); > WARNING: hash indexes are not WAL-logged and their use is discouraged > > But not for for unlogged tables. > > postgres=# create unlogged table test (id int primary key, v text); > postgres=# create index on test using hash (id); > postgres=# (no warning) > > And fails on promotion in the same way. > > postgres=# select * from test; > ERROR: could not open file "base/13324/16446": No such file or directory > > indexcmds.c@965:503 >> if (strcmp(accessMethodName, "hash") == 0 && >> RelationNeedsWAL(rel)) >> ereport(WARNING, >> (errmsg("hash indexes are not WAL-logged and their use is >> discouraged"))); > > Using !RelationUsesLocalBuffers instead fixes that and the > attached patch is for 9.6. I'm a bit unconfident on the usage of > logical meaning of the macro but what it does fits there. > I think giving an error message like "hash indexes are not WAL-logged and .." for unlogged tables doesn't seem like a good behavior. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Wed, Sep 20, 2017 at 6:44 PM, Robert Haas wrote: > On Wed, Sep 20, 2017 at 7:45 AM, Amit Kapila wrote: >> Right, I was thinking from the perspective of the index entry. Before >> marking index entry as dead, we do check for heaptid. So, as heaptid >> can't be reused via Page-at-a-time index vacuum, scan won't mark index >> entry as dead. > > It can mark index entries dead, but if it does, they correspond to > heap TIDs that are still dead, as opposed to heap TIDs that have been > resurrected by being reused for an unrelated tuple. > > In other words, the danger scenario is this: > > 1. A page-at-a-time scan records all the TIDs on a page. > 2. VACUUM processes the page, removing some of those TIDs. > 3. VACUUM finishes, changing the heap TIDs from dead to unused. > 4. Somebody inserts a new tuple at one of the existing TIDs, and the > index tuple gets put on the page scanned in step 1. > 5. The page-at-a-time scan resumes and kills the tuple added in step 4 > by mistake, when it really only intended to kill a tuple removed in > step 2. > > What prevent this is: > > A. To begin scanning a bucket, VACUUM needs a cleanup lock on the > primary bucket page. Therefore, there are no scans in progress at the > time that VACUUM begins scanning the bucket. > > B. If a scan begins scanning the bucket, it can't pass VACUUM, because > VACUUM doesn't release the page lock on one page before taking the one > for the next page. > > C. After 0003, it becomes possible for a scan to pass VACUUM if the > table is permanent, but it won't be a problem because of the LSN > check. > That's right. So, in short, this patch handles the problemetic scenario. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Wed, Sep 20, 2017 at 4:56 PM, Robert Haas wrote: > On Wed, Sep 20, 2017 at 7:19 AM, Amit Kapila wrote: >>> Page-at-a-time index vacuum as in _hash_vacuum_one_page doesn't matter >>> because such an operation doesn't allow TIDs to be reused. >> >> Page-at-a-time index vacuum also allows TIDs to be reused but this is >> done only for already marked dead items whereas vacuum can make the >> non-dead entries to be removed. We don't have a problem with >> page-at-a-time vacuum as it won't remove any items which the scan is >> going to mark as dead. > > I don't think page-at-a-time index vacuum allows heap TIDs to be > reused. > Right, I was thinking from the perspective of the index entry. Before marking index entry as dead, we do check for heaptid. So, as heaptid can't be reused via Page-at-a-time index vacuum, scan won't mark index entry as dead. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Wed, Sep 20, 2017 at 4:04 PM, Robert Haas wrote: > On Tue, Sep 19, 2017 at 11:34 PM, Amit Kapila wrote: >> This point has been discussed above [1] and to avoid this problem we >> are keeping the scan always behind vacuum for unlogged and temporary >> tables as we are doing without this patch. That will ensure vacuum >> won't be able to remove the TIDs which we are going to mark as dead. >> This has been taken care in patch 0003. I understand that this is >> slightly ugly, but the other alternative (as mentioned in the email >> [1]) is much worse. > > Hmm. So if I understand correctly, you're saying that the LSN check > in patch 0001 is actually completely unnecessary if we only apply > 0001, but is needed in preparation for 0003, after which it will > really be doing something? > Right. > In more detail, I suppose the idea is: a TID cannot be reused until a > VACUUM has intervened; VACUUM always visits every data page in the > index; we won't allow a scan to pass VACUUM (and thus possibly have > one of its TIDs get reused) except when the LSN check is actually > sufficient to guarantee no TID reuse (i.e. table is not unlogged). Right. > Page-at-a-time index vacuum as in _hash_vacuum_one_page doesn't matter > because such an operation doesn't allow TIDs to be reused. > Page-at-a-time index vacuum also allows TIDs to be reused but this is done only for already marked dead items whereas vacuum can make the non-dead entries to be removed. We don't have a problem with page-at-a-time vacuum as it won't remove any items which the scan is going to mark as dead. > Did I get it right? > I think so. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes wrote: > On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro > wrote: >> >> On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila >> wrote: >> > The attached patch fixes both the review comments as discussed above. > > > that should be fixed by turning costs on the explain, as is the tradition. > Right. BTW, did you get a chance to run the original test (for which you have reported the problem) with this patch? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment
On Wed, Sep 20, 2017 at 9:23 AM, Tom Lane wrote: > Craig Ringer writes: >> On 19 September 2017 at 18:04, Petr Jelinek >> wrote: >>> If you are asking why they are not identified by the >>> BackgroundWorkerHandle, then it's because it's private struct and can't >>> be shared with other processes so there is no way to link the logical >>> worker info with bgworker directly. > >> I really want BackgroundWorkerHandle to be public, strong +1 from me. > > I'm confused about what you think that would accomplish. AFAICS, the > point of BackgroundWorkerHandle is to allow the creator/requestor of > a bgworker to verify whether or not the slot in question is still > "owned" by its request. > Right, but can we avoid maintaining additional information (in_use, generation,..) in LogicalRepWorker which is similar to bgworker worker machinery (which in turn can also avoid all the housekeeping for those variables) if we have access to BackgroundWorkerHandle? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier wrote: > On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane wrote: >> Amit Kapila writes: >>> On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier >>> wrote: >>>> I am not saying that no index AMs take advantage FPW compressibility >>>> for their meta pages. There are cases like this one, as well as one >>>> code path in BRIN where this is useful, and it is useful as well when >>>> logging FPWs of the init forks for unlogged relations. >> >>> Hmm, why is it useful for logging FPWs of the init forks for unlogged >>> relations? We don't use REGBUF_STANDARD in those cases. >> >> But if we started to do so, that would be a concrete benefit of this >> patch ... > > In the proposed set of patches, all the empty() routines part of index > AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the > right thing by updating log_newpage_buffer(). btree also should have > its call to log_newpage updated in btbuildempty(), and your patch is > missing that. > We can add that for btree patch. > Also, _hash_init() would need some extra work to > generate FPWs, but I don't think that it is necessary per its handling > of a per-record meta data either. So REGBUF_STANDARD could be just > removed from there, and there is actually no need to patch > src/backend/access/hash at all. > I think there is no need to remove it. As per discussion above, we want to keep REGBUF_STANDARD for all metapage initializations for the matter of consistency and that will be useful for wal_consistency_checking in which case we anyway need full page image. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Tue, Sep 19, 2017 at 9:49 PM, Robert Haas wrote: > On Thu, Aug 24, 2017 at 11:26 AM, Jesper Pedersen > wrote: >> Based on the feedback in this thread, I have moved the patch to "Ready for >> Committer". > > Reviewing 0001: > > _hash_readpage gets the page LSN to see if we can apply LP_DEAD hints, > but if the table is unlogged or temporary, the LSN will never change, > so the test in _hash_kill_items will always think that it's OK to > apply the hints. (This seems like it might be a pretty serious > problem, because I'm not sure what would be a viable workaround.) > This point has been discussed above [1] and to avoid this problem we are keeping the scan always behind vacuum for unlogged and temporary tables as we are doing without this patch. That will ensure vacuum won't be able to remove the TIDs which we are going to mark as dead. This has been taken care in patch 0003. I understand that this is slightly ugly, but the other alternative (as mentioned in the email [1]) is much worse. [1] - https://www.postgresql.org/message-id/CAA4eK1J6xiJUOidBaOt0iPsAdS0%2Bp5PoKFf1R2yVjTwrY_4snA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment
On Tue, Sep 19, 2017 at 6:51 PM, Petr Jelinek wrote: > On 19/09/17 15:08, Amit Kapila wrote: >> >> I am not much aware of this area. Can you explain what other usages >> it has apart from in the process that has launched the worker and in >> worker itself? >> > > The subscription "DDL" commands and monitoring functions need access to > that info. > Got your point, but still not sure if we need to maintain additional information to replicate something similar to bgworker machinery. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment
On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek wrote: > n 18/09/17 18:42, Tom Lane wrote: >> Amit Kapila writes: >>> On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane wrote: >>>> The subscriber log includes >>>> 2017-09-18 08:43:08.240 UTC [15672] WARNING: out of background worker >>>> slots >>>> Maybe that's harmless, but I'm suspicious that it's a smoking gun. >> >>> I have noticed while fixing the issue you are talking that this path >>> is also susceptible to such problems. In >>> WaitForReplicationWorkerAttach(), it relies on >>> GetBackgroundWorkerPid() to know the status of the worker which won't >>> give the correct status in case of fork failure. The patch I have >>> posted has a fix for the issue, >> >> Your GetBackgroundWorkerPid fix looks good as far as it goes, but >> I feel that WaitForReplicationWorkerAttach's problems go way deeper >> than that --- in fact, that function should not exist at all IMO. >> >> The way that launcher.c is set up, it's relying on either the calling >> process or the called process to free the logicalrep slot when done. >> The called process might never exist at all, so the second half of >> that is obviously unreliable; but WaitForReplicationWorkerAttach >> can hardly be relied on either, seeing it's got a big fat exit-on- >> SIGINT in it. I thought about putting a PG_TRY, or more likely >> PG_ENSURE_ERROR_CLEANUP, into it, but that doesn't fix the basic >> problem: you can't assume that the worker isn't ever going to start, >> just because you got a signal that means you shouldn't wait anymore. >> >> Now, this rickety business might be fine if it were only about the >> states of the caller and called processes, but we've got long-lived >> shared data involved (ie the worker slots); failing to clean it up >> is not an acceptable outcome. >> > > We'll clean it up eventually if somebody requests creation of new > logical replication worker and that slot is needed. See the "garbage > collection" part in logicalrep_worker_launch(). I know it's not ideal > solution, but it's the best one I could think of given the bellow. > >> So, frankly, I think we would be best off losing the "logical rep >> worker slot" business altogether, and making do with just bgworker >> slots. The alternative is getting the postmaster involved in cleaning >> up lrep slots as well as bgworker slots, and I'm going to resist >> any such idea strongly. That way madness lies, or at least an >> unreliable postmaster --- if we need lrep slots, what other forty-seven >> data structures are we going to need the postmaster to clean up >> in the future? >> >> I haven't studied the code enough to know why it grew lrep worker >> slots in the first place. Maybe someone could explain? >> > > I am not quite sure I understand this question, we need to store > additional info about workers in shared memory so we need slots for that. > > If you are asking why they are not identified by the > BackgroundWorkerHandle, then it's because it's private struct and can't > be shared with other processes so there is no way to link the logical > worker info with bgworker directly. > Not sure, but can't we identify the actual worker slot if we just have pid of background worker? IIUC, you need access to BackgroundWorkerHandle by other processes, so that you can stop them like in case of drop subscription command. If so, then, might be storing pid can serve the purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Tue, Sep 19, 2017 at 9:33 AM, Michael Paquier wrote: > On Tue, Sep 19, 2017 at 12:57 PM, Michael Paquier > wrote: >> I'd think about adjusting the comments the proper way for each AM so >> as one can read those comments and catch any limitation immediately. >> The fact this has not been pointed out on this thread before any >> checks and the many mails exchanged on the matter on this thread make >> me think that the current code does not outline the current code >> properties appropriately. > > Another thing that we could consider as well is adding an assertion in > XLogRegisterBuffer & friends so as the combination of REGBUF_STANDARD > and REGBUF_NO_IMAGE is forbidden. That's bugging me as well. > Is that related to this patch? If not, then maybe we can discuss it in a separate thread. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier wrote: > On Tue, Sep 19, 2017 at 12:40 PM, Amit Kapila wrote: >> On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier >> wrote: >>> On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila >>> wrote: >>>> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier >>>> wrote: >>>>> >>>> >>>> You have already noticed above that it will help when >>>> wal_checking_consistency is used and that was the main motivation to >>>> pass REGBUF_STANDARD apart from maintaining consistency. It is not >>>> clear to me what is bothering you. If your only worry about these >>>> patches is that you want this sentence to be removed from the comment >>>> because you think it is obvious or doesn't make much sense, then I >>>> think we can leave this decision to committer. I have added it based >>>> on Tom's suggestion above thread about explaining why it is >>>> inessential or essential to set pd_lower. I think Amit Langote just >>>> tried to mimic what I have done in hash and btree patches to maintain >>>> consistency. I am also not very sure if we should write some detailed >>>> comment or leave the existing comment as it is. I think it is just a >>>> matter of different perspective. >>> >>> What is disturbing me a bit is that the existing comments mention >>> something that could be supported (the compression of pages), but >>> that's actually not done and this is unlikely to happen because the >>> number of bytes associated to a meta page is going to be always >>> cheaper than a FPW, which would cost in CPU to store it for >>> compression is enabled. So I think that we should switch comments to >>> mention that pd_lower is set so as this helps with page masking, but >>> we don't take advantage of XLOG compression in the code. >>> >> >> I think that is not true because we do need FPW for certain usages of >> metapage. Consider a case in _hash_doinsert where register metabuf >> with just >> REGBUF_STANDARD, it can take advantage of removing the hole if >> pd_lower is set to its correct position. > > I am not saying that no index AMs take advantage FPW compressibility > for their meta pages. There are cases like this one, as well as one > code path in BRIN where this is useful, and it is useful as well when > logging FPWs of the init forks for unlogged relations. > Hmm, why is it useful for logging FPWs of the init forks for unlogged relations? We don't use REGBUF_STANDARD in those cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment
On Tue, Sep 19, 2017 at 6:29 PM, Petr Jelinek wrote: > On 19/09/17 14:33, Amit Kapila wrote: >> On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek >> wrote: >>> n 18/09/17 18:42, Tom Lane wrote: >>> >>>> So, frankly, I think we would be best off losing the "logical rep >>>> worker slot" business altogether, and making do with just bgworker >>>> slots. >> >> I think that would be cleaner as compared to what we have now. >> >>>> The alternative is getting the postmaster involved in cleaning >>>> up lrep slots as well as bgworker slots, and I'm going to resist >>>> any such idea strongly. That way madness lies, or at least an >>>> unreliable postmaster --- if we need lrep slots, what other forty-seven >>>> data structures are we going to need the postmaster to clean up >>>> in the future? >>>> >>>> I haven't studied the code enough to know why it grew lrep worker >>>> slots in the first place. Maybe someone could explain? >>>> >>> >>> I am not quite sure I understand this question, we need to store >>> additional info about workers in shared memory so we need slots for that. >>> >> >> Yeah, but you could have used the way we do for parallel query where >> we setup dsm and share all such information. You can check the logic >> of execparallel.c and parallel.c to see how we do all such stuff for >> parallel query. >> > > I don't understand how DSM helps in this case (except perhaps the memory > allocation being dynamic rather than static). We still need this shared > memory area to be accessible from other places than launcher (the > paralllel query has one lead which manages everything, that's not true > for logical replication) > I am not much aware of this area. Can you explain what other usages it has apart from in the process that has launched the worker and in worker itself? > and we need it to survive restart of launcher > for example, so all the current issues will stay. > Do you mean to say that you want to save this part of shared memory across restart of launcher? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment
On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek wrote: > n 18/09/17 18:42, Tom Lane wrote: > >> So, frankly, I think we would be best off losing the "logical rep >> worker slot" business altogether, and making do with just bgworker >> slots. I think that would be cleaner as compared to what we have now. >> The alternative is getting the postmaster involved in cleaning >> up lrep slots as well as bgworker slots, and I'm going to resist >> any such idea strongly. That way madness lies, or at least an >> unreliable postmaster --- if we need lrep slots, what other forty-seven >> data structures are we going to need the postmaster to clean up >> in the future? >> >> I haven't studied the code enough to know why it grew lrep worker >> slots in the first place. Maybe someone could explain? >> > > I am not quite sure I understand this question, we need to store > additional info about workers in shared memory so we need slots for that. > Yeah, but you could have used the way we do for parallel query where we setup dsm and share all such information. You can check the logic of execparallel.c and parallel.c to see how we do all such stuff for parallel query. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier wrote: > On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila wrote: >> On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier >> wrote: >>> >> >> You have already noticed above that it will help when >> wal_checking_consistency is used and that was the main motivation to >> pass REGBUF_STANDARD apart from maintaining consistency. It is not >> clear to me what is bothering you. If your only worry about these >> patches is that you want this sentence to be removed from the comment >> because you think it is obvious or doesn't make much sense, then I >> think we can leave this decision to committer. I have added it based >> on Tom's suggestion above thread about explaining why it is >> inessential or essential to set pd_lower. I think Amit Langote just >> tried to mimic what I have done in hash and btree patches to maintain >> consistency. I am also not very sure if we should write some detailed >> comment or leave the existing comment as it is. I think it is just a >> matter of different perspective. > > What is disturbing me a bit is that the existing comments mention > something that could be supported (the compression of pages), but > that's actually not done and this is unlikely to happen because the > number of bytes associated to a meta page is going to be always > cheaper than a FPW, which would cost in CPU to store it for > compression is enabled. So I think that we should switch comments to > mention that pd_lower is set so as this helps with page masking, but > we don't take advantage of XLOG compression in the code. > I think that is not true because we do need FPW for certain usages of metapage. Consider a case in _hash_doinsert where register metabuf with just REGBUF_STANDARD, it can take advantage of removing the hole if pd_lower is set to its correct position. There are other similar usages in hash index. For other indexes like btree, there is no such usage currently, but it can also take advantage for wal_consistency_checking. Now, probably there is an argument that we use different comments for different indexes as the usage varies, but I think someone looking at code after reading the comments can differentiate such cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Mon, Sep 18, 2017 at 10:00 PM, Tom Lane wrote: > Amit Kapila writes: >> Attached patch fixes these problems. > > Hmm, this patch adds a kill(notify_pid) after one call to > ForgetBackgroundWorker, but the postmaster has several more such calls. > Shouldn't they all notify the notify_pid? Should we move that > functionality into ForgetBackgroundWorker itself, so we can't forget > it again? > Among other places, we already notify during ReportBackgroundWorkerExit(). However, it seems to me that all other places except where this patch has added a call to notify doesn't need such a call. The other places like in DetermineSleepTime and ResetBackgroundWorkerCrashTimes is called for a crashed worker which I don't think requires notification to the backend as that backend itself would have restarted. The other place where we call ForgetBackgroundWorker is in maybe_start_bgworkers when rw_terminate is set to true which again seems to be either the case of worker crash or when someone has explicitly asked to terminate the worker in which case we already send a notification. I think we need to notify the backend on start, stop and failure to start a worker. OTOH, if it is harmless to send a notification even after the worker is crashed, then we can just move that functionality into ForgetBackgroundWorker itself as that will simplify the code and infact that is the first thing that occurred to me as well, but I haven't done that way as I was not sure if we want to send notification in all kind of cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment
On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane wrote: > Thomas Munro writes: >> In this build you can see the output of the following at the end, >> which might provide clues to the initiated. You might need to click a >> small triangle to unfold the commands' output. > >> cat ./src/test/subscription/tmp_check/log/002_types_publisher.log >> cat ./src/test/subscription/tmp_check/log/002_types_subscriber.log >> cat ./src/test/subscription/tmp_check/log/regress_log_002_types > > The subscriber log includes > 2017-09-18 08:43:08.240 UTC [15672] WARNING: out of background worker slots > 2017-09-18 08:43:08.240 UTC [15672] HINT: You might need to increase > max_worker_processes. > > Maybe that's harmless, but I'm suspicious that it's a smoking gun. > I think perhaps this reflects a failed attempt to launch a worker, > which the caller does not realize has failed to launch because of the > lack of worker-fork-failure error recovery I bitched about months ago > [1], leading to subscription startup waiting forever for a worker that's > never going to report finishing. > > I see Amit K. just posted a patch in that area [2], haven't looked at it > yet. > I have noticed while fixing the issue you are talking that this path is also susceptible to such problems. In WaitForReplicationWorkerAttach(), it relies on GetBackgroundWorkerPid() to know the status of the worker which won't give the correct status in case of fork failure. The patch I have posted has a fix for the issue, however, this could be an entirely different issue altogether as it appears from your next email in this thread. [1] - https://www.postgresql.org/message-id/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB%3D3uw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GatherMerge misses to push target list
On Mon, Sep 18, 2017 at 12:46 PM, Rushabh Lathia wrote: > Thanks Amit for the patch. > > I reviewed the code changes as well as performed more testing. Patch > looks good to me. > In that case, can you please mark the patch [1] as ready for committer in CF app > Here is the updated patch - where added test-case clean up. > oops, missed dropping the function. Thanks for the review. [1] - https://commitfest.postgresql.org/15/1293/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] parallel.c oblivion of worker-startup failures
Sometime back Tom Lane has reported [1] about $Subject. I have looked into the issue and found that the problem is not only with parallel workers but with general background worker machinery as well in situations where fork or some such failure occurs. The first problem is that after we register the dynamic worker, the way to know whether the worker has started (WaitForBackgroundWorkerStartup/GetBackgroundWorkerPid) won't give the right answer if the fork failure happens. Also, in cases where the worker is marked not to start after the crash, postmaster doesn't notify the backend if it is not able to start the worker which can make the backend wait forever as it is oblivion of the fact that the worker is not started. Now, apart from these general problems of background worker machinery, parallel.c assumes that after it has registered the dynamic workers, they will start and perform their work. We need to ensure that in case, postmaster is not able to start parallel workers due to fork failure or any similar situations, backend doesn't keep on waiting forever. To fix it, before waiting for workers to finish, we can check whether the worker exists at all. Attached patch fixes these problems. Another way to fix the parallel query related problem is that after registering the workers, the master backend should wait for workers to start before setting up different queues for them to communicate. I think that can be quite expensive. Thoughts? [1] - https://www.postgresql.org/message-id/4905.1492813...@sss.pgh.pa.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_worker_startup_failures_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier wrote: > On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote > wrote: >> On 2017/09/14 16:00, Michael Paquier wrote: >>> On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote >>> wrote: >>>> Sure, no problem. >>> >>> OK, I took a closer look at all patches, but did not run any manual >>> tests to test the compression except some stuff with >>> wal_consistency_checking. >> >> Thanks for the review. >> >>> For SpGist, I think that there are two missing: spgbuild() and >>> spgGetCache(). >> >> spgbuild() calls SpGistInitMetapage() before marking the metapage buffer >> dirty. The latter already sets pd_lower correctly, so we don't need to do >> it explicitly in spgbuild() itself. > > Check. > >> spgGetCache() doesn't write the metapage, only reads it: >> >> /* Last, get the lastUsedPages data from the metapage */ >> metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO); >> LockBuffer(metabuffer, BUFFER_LOCK_SHARE); >> >> metadata = SpGistPageGetMeta(BufferGetPage(metabuffer)); >> >> if (metadata->magicNumber != SPGIST_MAGIC_NUMBER) >> elog(ERROR, "index \"%s\" is not an SP-GiST index", >> RelationGetRelationName(index)); >> >> cache->lastUsedPages = metadata->lastUsedPages; >> >> UnlockReleaseBuffer(metabuffer); >> >> So, I think it won't be correct to set pd_lower here, no? > > Yeah, I am just reading the code again and there is no alarm here. > >> Updated patch attached, which implements your suggested changes to the >> masking functions. >> >> By the way, as I noted on another unrelated thread, I will not be able to >> respond to emails from tomorrow until 9/23. > > No problem. Enjoy your vacations. > > I have spent some time looking at the XLOG insert code, and tested the > compressibility of the meta pages. And I have noticed that all pages > registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not > take a FPW of the block registered because the page will be > reinitialized at replay, and in such cases the zero'ed page is filled > with the data from the record. log_newpage is used to initialize init > forks for unlogged relations, which is where this patch allows page > compression to take effect correctly. Still setting REGBUF_STANDARD > with REGBUF_WILL_INIT is actually a no-op, except if > wal_checking_consistency is used when registering a buffer for WAL > insertion. There is one code path though where things are useful all > the time: revmap_physical_extend for BRIN. > > The patch is using the correct logic, still such comments are in my > opinion incorrect because of the reason written above: > +* This won't be of any help unless we use option like REGBUF_STANDARD > +* while registering the buffer for metapage as otherwise, it won't try to > +* remove the hole while logging the full page image. > Here REGBUF_STANDARD is actually a no-op for btree. > You have already noticed above that it will help when wal_checking_consistency is used and that was the main motivation to pass REGBUF_STANDARD apart from maintaining consistency. It is not clear to me what is bothering you. If your only worry about these patches is that you want this sentence to be removed from the comment because you think it is obvious or doesn't make much sense, then I think we can leave this decision to committer. I have added it based on Tom's suggestion above thread about explaining why it is inessential or essential to set pd_lower. I think Amit Langote just tried to mimic what I have done in hash and btree patches to maintain consistency. I am also not very sure if we should write some detailed comment or leave the existing comment as it is. I think it is just a matter of different perspective. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Thu, Sep 14, 2017 at 8:30 PM, Amit Khandekar wrote: > On 11 September 2017 at 18:55, Amit Kapila wrote: >>> >> >> How? See, if you have four partial subpaths and two non-partial >> subpaths, then for parallel-aware append it considers all six paths in >> parallel path whereas for non-parallel-aware append it will consider >> just four paths and that too with sub-optimal strategy. Can you >> please try to give me some example so that it will be clear. > > Suppose 4 appendrel children have costs for their cheapest partial (p) > and non-partial paths (np) as shown below : > > p1=5000 np1=100 > p2=200 np2=1000 > p3=80 np3=2000 > p4=3000 np4=50 > > Here, following two Append paths will be generated : > > 1. a parallel-aware Append path with subpaths : > np1, p2, p3, np4 > > 2. Partial (i.e. non-parallel-aware) Append path with all partial subpaths: > p1,p2,p3,p4 > > Now, one thing we can do above is : Make the path#2 parallel-aware as > well; so both Append paths would be parallel-aware. > Yes, we can do that and that is what I think is probably better. So, the question remains that in which case non-parallel-aware partial append will be required? Basically, it is not clear to me why after having parallel-aware partial append we need non-parallel-aware version? Are you keeping it for the sake of backward-compatibility or something like for cases if someone has disabled parallel append with the help of new guc in this patch? > > So above, what I am saying is, we can't tell which of the paths #1 and > #2 are cheaper until we calculate total cost. I didn't understand what > did you mean by "non-parallel-aware append will consider only the > partial subpaths and that too with sub-optimal strategy" in the above > example. I guess, you were considering a different scenario than the > above one. > Yes, something different, but I think you can ignore that as we can discuss the guts of my point based on the example given by you above. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas wrote: > On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila wrote: >> I think the patch stores only non-partial paths in decreasing order, >> what if partial paths having more costs follows those paths? > > The general picture here is that we don't want the leader to get stuck > inside some long-running operation because then it won't be available > to read tuples from the workers. On the other hand, we don't want to > just have the leader do no work because that might be slow. And in > most cast cases, the leader will be the first participant to arrive at > the Append node, because of the worker startup time. So the idea is > that the workers should pick expensive things first, and the leader > should pick cheap things first. > At a broader level, the idea is good, but I think it won't turn out exactly like that considering your below paragraph which indicates that it is okay if leader picks a partial path that is costly among other partial paths as a leader won't be locked with that. Considering this is a good design for parallel append, the question is do we really need worker and leader to follow separate strategy for choosing next path. I think the patch will be simpler if we can come up with a way for the worker and leader to use the same strategy to pick next path to process. How about we arrange the list of paths such that first, all partial paths will be there and then non-partial paths and probably both in decreasing order of cost. Now, both leader and worker can start from the beginning of the list. In most cases, the leader will start at the first partial path and will only ever need to scan non-partial path if there is no other partial path left. This is not bulletproof as it is possible that some worker starts before leader in which case leader might scan non-partial path before all partial paths are finished, but I think we can avoid that as well if we are too worried about such cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Thu, Aug 31, 2017 at 11:23 AM, Amit Kapila wrote: > On Mon, Aug 21, 2017 at 2:40 PM, Amit Kapila wrote: >> On Mon, Aug 21, 2017 at 1:44 PM, Haribabu Kommi >> wrote: >>> >>> >>> Thanks for adding more details. It is easy to understand. >>> >>> I marked the patch as ready for committer in the commitfest. >>> > > Rebased the patch. The output of test case added by the patch is also > slightly changed because of the recent commit > 7df2c1f8daeb361133ac8bdeaf59ceb0484e315a. I have verified that the > new test result is as expected. > The latest patch again needs to be rebased. Find rebased patch attached with this email. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pq_pushdown_initplan_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Thu, Sep 14, 2017 at 12:30 PM, Michael Paquier wrote: > On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote > wrote: >> Sure, no problem. > > > +* > +* This won't be of any help unless we use option like REGBUF_STANDARD > +* while registering the buffer for metapage as otherwise, it won't try to > +* remove the hole while logging the full page image. > */ > This comment is in the btree code. But you actually add > REGBUF_STANDARD. So I think that this could be just removed. > > * Set pd_lower just past the end of the metadata. This is not essential > -* but it makes the page look compressible to xlog.c. > +* but it makes the page look compressible to xlog.c. See > +* _bt_initmetapage. > This reference could be removed as well as _bt_initmetapage does not > provide any information, the existing comment is wrong without your > patch, and then becomes right with this patch. > I have added this comment just to add some explanation as to why we are setting pd_lower and what makes it useful. We can change it or remove it, but I am not sure what is the right thing to do here, may be we can defer this to the committer. > > It seems to me that an update of pd_lower is missing in _bt_getroot(), > just before marking the buffer as dirty I think. And there is a second > in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally > one in _bt_newroot(). > > > For hash, hashbulkdelete(), _hash_vacuum_one_page(), > _hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are > missing the shot, no? We could have a meta page of a hash index > upgraded from PG10. > Why do we need to change metapage at every place for btree or hash? Any index that is upgraded should have pd_lower set, do you have any case in mind where it won't be set? For hash, if someone upgrades from a version lower than 9.6, it might not have set, but we already give warning to reindex the hash indexes upgraded from a version lower than 10. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GatherMerge misses to push target list
On Wed, Sep 13, 2017 at 5:30 PM, Rushabh Lathia wrote: > On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila > wrote: >> > > > This seems like a good optimization. I tried to simulate the test given > in the mail, initially wasn't able to generate the exact test - as index > creation is missing in the test shared. > Oops. > I also won't consider this as bug, but its definitely good optimization > for GatherMerge. > >> >> >> Note - If we agree on the problems and fix, then I can add regression >> tests to cover above cases in the patch. > > > Sure, once you do that - I will review the patch. > The attached patch contains regression test as well. Thanks for looking into it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pushdown_target_gathermerge_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Sep 13, 2017 at 9:39 AM, Amit Kapila wrote: > On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar > wrote: >> On 5 September 2017 at 14:04, Amit Kapila wrote: >> >> I started with a quick review ... a couple of comments below : >> >> - * If this is a baserel, consider gathering any partial paths we may have >> - * created for it. (If we tried to gather inheritance children, we could >> + * If this is a baserel and not the only rel, consider gathering any >> + * partial paths we may have created for it. (If we tried to gather >> >> /* Create GatherPaths for any useful partial paths for rel */ >> - generate_gather_paths(root, rel); >> + if (lev < levels_needed) >> + generate_gather_paths(root, rel, NULL); >> >> I think at the above two places, and may be in other place also, it's >> better to mention the reason why we should generate the gather path >> only if it's not the only rel. >> > > I think the comment you are looking is present where we are calling > generate_gather_paths in grouping_planner. Instead of adding same or > similar comment at multiple places, how about if we just say something > like "See in grouping_planner where we generate gather paths" at all > other places? > >> -- >> >> - if (rel->reloptkind == RELOPT_BASEREL) >> - generate_gather_paths(root, rel); >> + if (rel->reloptkind == RELOPT_BASEREL && >> root->simple_rel_array_size > 2) >> + generate_gather_paths(root, rel, NULL); >> >> Above, in case it's a partitioned table, root->simple_rel_array_size >> includes the child rels. So even if it's a simple select without a >> join rel, simple_rel_array_size would be > 2, and so gather path would >> be generated here for the root table, and again in grouping_planner(). >> > > Yeah, that could be a problem. I think we should ensure that there is > no append rel list by checking root->append_rel_list. Can you think > of a better way to handle it? > The attached patch fixes both the review comments as discussed above. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel_paths_include_tlist_cost_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar wrote: > On 5 September 2017 at 14:04, Amit Kapila wrote: > > I started with a quick review ... a couple of comments below : > > - * If this is a baserel, consider gathering any partial paths we may have > - * created for it. (If we tried to gather inheritance children, we could > + * If this is a baserel and not the only rel, consider gathering any > + * partial paths we may have created for it. (If we tried to gather > > /* Create GatherPaths for any useful partial paths for rel */ > - generate_gather_paths(root, rel); > + if (lev < levels_needed) > + generate_gather_paths(root, rel, NULL); > > I think at the above two places, and may be in other place also, it's > better to mention the reason why we should generate the gather path > only if it's not the only rel. > I think the comment you are looking is present where we are calling generate_gather_paths in grouping_planner. Instead of adding same or similar comment at multiple places, how about if we just say something like "See in grouping_planner where we generate gather paths" at all other places? > -- > > - if (rel->reloptkind == RELOPT_BASEREL) > - generate_gather_paths(root, rel); > + if (rel->reloptkind == RELOPT_BASEREL && > root->simple_rel_array_size > 2) > + generate_gather_paths(root, rel, NULL); > > Above, in case it's a partitioned table, root->simple_rel_array_size > includes the child rels. So even if it's a simple select without a > join rel, simple_rel_array_size would be > 2, and so gather path would > be generated here for the root table, and again in grouping_planner(). > Yeah, that could be a problem. I think we should ensure that there is no append rel list by checking root->append_rel_list. Can you think of a better way to handle it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Tue, Sep 12, 2017 at 3:51 PM, Amit Langote wrote: > On 2017/09/11 18:13, Michael Paquier wrote: >> On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote wrote: >>> On 2017/09/10 15:22, Michael Paquier wrote: >>>> Coordinating efforts here would be nice. If you, Amit K, are taking >>>> care of a patch for btree and hash, would you, Amit L, write the part >>>> for GIN, BRIN and SpGist? This needs a careful lookup as many code >>>> paths need a lookup so it may take time. Please note that I don't mind >>>> double-checking this part if you don't have enough room to do so. >>> >>> Sorry, I didn't have time today to carefully go through the recent >>> discussion on this thread (starting with Tom's email wherein he said he >>> set the status of the patch to Waiting on Author). I will try tomorrow. >> >> Thanks for the update! Once you get to this point, please let me know >> if you would like to work on a more complete patch for brin, gin and >> spgist. If you don't have enough room, I am fine to produce something. > > I updated the patches for GIN, BRIN, and SP-GiST to include the following > changes: > > 1. Pass REGBUF_STNADARD flag when registering the metapage buffer > I have looked into brin patch and it seems you have not considered all usages of meta page. The structure BrinRevmap also contains a reference to meta page buffer and when that is modified (ex. in revmap_physical_extend), then also I think you need to consider using REGBUF_STNADARD flag. > > Did I miss something from the discussion? > I think one point which might be missed is that the patch needs to modify pd_lower for all usages of metapage, not only when it is first time initialized. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GatherMerge misses to push target list
On Wed, Sep 6, 2017 at 10:04 AM, Amit Kapila wrote: > During my recent work on costing of parallel paths [1], I noticed that > we are missing to push target list below GatherMerge in some simple > cases like below. > I think this should be considered as a bug-fix for 10.0, but it doesn't block any functionality or give wrong results, so we might consider it as an optimization for GatherMerge. In any case, I have added this to next CF. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Mon, Sep 11, 2017 at 4:49 PM, Amit Khandekar wrote: > On 8 September 2017 at 19:17, Amit Kapila wrote: >> >> In that case, why can't we keep the workers also process in same >> order, what is the harm in that? > > Because of the way the logic of queuing works, the workers finish > earlier if they start with expensive plans first. For e.g. : with 3 > plans with costs 8, 4, 4 and with 2 workers w1 and w2, they will > finish in 8 time units (w1 will finish plan 1 in 8, while in parallel > w2 will finish the remaining 2 plans in 8 units. Whereas if the plans > are ordered like : 4, 4, 8, then the workers will finish in 12 time > units (w1 and w2 will finish each of the 1st two plans in 4 units, and > then w1 or w2 will take up plan 3 and finish in 8 units, while the > other worker remains idle). > I think the patch stores only non-partial paths in decreasing order, what if partial paths having more costs follows those paths? > >> >> Few more comments: >> >> 1. >> + else if (IsA(subpath, MergeAppendPath)) >> + { >> + MergeAppendPath *mpath = (MergeAppendPath *) subpath; >> + >> + /* >> + * If at all MergeAppend is partial, all its child plans have to be >> + * partial : we don't currently support a mix of partial and >> + * non-partial MergeAppend subpaths. >> + */ >> + if (is_partial) >> + return list_concat(partial_subpaths, list_copy(mpath->subpaths)); >> >> In which situation partial MergeAppendPath is generated? Can you >> provide one example of such path? > > Actually currently we don't support partial paths for MergeAppendPath. > That code just has that if condition (is_partial) but currently that > condition won't be true for MergeAppendPath. > I think then it is better to have Assert for MergeAppend. It is generally not a good idea to add code which we can never hit. >> >> 2. >> add_paths_to_append_rel() .. >> >> I think it might be better to add a sentence why we choose a different >> way to decide a number of workers in the second case >> (non-parallel-aware append). > > Yes, I agree. Will do that after we conclude with your next point below ... > >> Do you think non-parallel-aware Append >> will be better in any case when there is a parallel-aware append? I >> mean to say let's try to create non-parallel-aware append only when >> parallel-aware append is not possible. > > By non-parallel-aware append, I am assuming you meant partial > non-parallel-aware Append. Yes, if the parallel-aware Append path has > *all* partial subpaths chosen, then we do omit a partial non-parallel > Append path, as seen in this code in the patch : > > /* > * Consider non-parallel partial append path. But if the parallel append > * path is made out of all partial subpaths, don't create another partial > * path; we will keep only the parallel append path in that case. > */ > if (partial_subpaths_valid && !pa_all_partial_subpaths) > { > .. > } > > But if the parallel-Append path has a mix of partial and non-partial > subpaths, then we can't really tell which of the two could be cheapest > until we calculate the cost. It can be that the non-parallel-aware > partial Append can be cheaper as well. > How? See, if you have four partial subpaths and two non-partial subpaths, then for parallel-aware append it considers all six paths in parallel path whereas for non-parallel-aware append it will consider just four paths and that too with sub-optimal strategy. Can you please try to give me some example so that it will be clear. >> >> 4. >> - select count(*) from a_star; >> -select count(*) from a_star; >> + select round(avg(aa)), sum(aa) from a_star; >> +select round(avg(aa)), sum(aa) from a_star; >> >> Why you have changed the existing test. It seems count(*) will also >> give what you are expecting. > > Needed to do cover some data testing with Parallel Append execution. > Okay. One more thing, I think you might want to update comment atop add_paths_to_append_rel to reflect the new type of parallel-aware append path being generated. In particular, I am referring to below part of the comment: * Similarly it collects partial paths from * non-dummy children to create partial append paths. */ static void add_paths_to_append_rel() -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Mon, Sep 11, 2017 at 12:43 PM, Michael Paquier wrote: > On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila wrote: >> I have prepared separate patches for hash and btree index. I think >> for another type of indexes, it is better to first fix the pd_lower >> issue. > > Just wondering (sorry I have not looked at your patch in details)... > Have you tested the compressibility effects of this patch on FPWs with > and without wal_compression? > I have debugged it to see that it is executing the code path to eliminate the hole for the hash index. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers