Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> -Original Message- > From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp] > Sent: Tuesday, October 20, 2015 1:11 PM > To: Robert Haas > Cc: Tom Lane; Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI; > pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/20 5:34, Robert Haas wrote: > > On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita > > wrote: > >> As Tom mentioned, just recomputing the original join tuple is not good > >> enough. We would need to rejoin the test tuples for the baserels even if > >> ROW_MARK_COPY is in use. Consider: > >> > >> A=# BEGIN; > >> A=# UPDATE t SET a = a + 1 WHERE b = 1; > >> B=# SELECT * from t, ft1, ft2 > >> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; > >> A=# COMMIT; > >> > >> where the plan for the SELECT FOR UPDATE is > >> > >> LockRows > >> -> Nested Loop > >> -> Seq Scan on t > >> -> Foreign Scan on > >> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND > >> ft1.a > >> = $1 AND ft2.b = $2 > >> > >> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the > >> original join tuple from the whole-row image that you proposed would output > >> an incorrect result in the EQP recheck since the value a in the updated > >> version of a to-be-joined tuple in t would no longer match the value ft1.a > >> extracted from the whole-row image if the A's UPDATE has committed > >> successfully. So I think we would need to rejoin the tuples populated from > >> the whole-row images for the baserels ft1 and ft2, by executing the > >> secondary plan with the new parameter values for a and b. > > > No. You just need to populate fdw_recheck_quals correctly, same as > > for the scan case. > > Yeah, I think we can probably do that for the case where a pushed-down > join clause is an inner-join one, but I'm not sure that we can do that > for the case where that clause is an outer-join one. Maybe I'm missing > something, though. > Please check my message yesterday. The non-nullable side of outer-join is always visible regardless of the join-clause pushed down, as long as it satisfies the scan-quals pushed-down. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SuperUser check in pg_stat_statements
Jim, I already tried to create a view upon the pg_stat_statements, but no luck. -- View this message in context: http://postgresql.nabble.com/SuperUser-check-in-pg-stat-statements-tp5870589p5870683.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On 19 October 2015 21:37, Robert Haas [mailto:robertmh...@gmail.com] Wrote: >On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera > wrote: >> Andres Freund wrote: >>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote: >>> > If I recall correctly, he concerned about killing the backends >>> > running transactions which could be saved. I have a sympathy with >>> > the opinion. >>> >>> I still don't. Leaving backends alive after postmaster has died >>> prevents the auto-restart mechanism to from working from there on. >>> Which means that we'll potentially continue happily after another >>> backend has PANICed and potentially corrupted shared memory. Which >>> isn't all that unlikely if postmaster isn't around anymore. >> >> I agree. When postmaster terminates without waiting for all backends >> to go away, things are going horribly wrong -- either a DBA has done >> something stupid, or the system is misbehaving. As Andres says, if >> another backend dies at that point, things are even worse -- the dying >> backend could have been holding a critical lwlock, for instance, or it >> could have corrupted shared buffers on its way out. It is not >> sensible to leave the rest of the backends in the system still trying >> to run just because there is no one there to kill them. > >Yep. +1 for changing this. Seems many people are in favor of this change. I have made changes to handle backend exit on postmaster death (after they finished their work and waiting for new command). Changes are as per approach explained in my earlier mail i.e. 1. WaitLatchOrSocket called from secure_read and secure_write function will wait on an additional event as WL_POSTMASTER_DEATH. 2. There is a possibility that the command is read without waiting on latch. This case is handled by checking postmaster status after command read (i.e. after ReadCommand). Attached is the patch. Thanks and Regards, Kumar Rajeev Rastogi dangling_backend_process.patch Description: dangling_backend_process.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/10/20 5:34, Robert Haas wrote: On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita wrote: As Tom mentioned, just recomputing the original join tuple is not good enough. We would need to rejoin the test tuples for the baserels even if ROW_MARK_COPY is in use. Consider: A=# BEGIN; A=# UPDATE t SET a = a + 1 WHERE b = 1; B=# SELECT * from t, ft1, ft2 WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; A=# COMMIT; where the plan for the SELECT FOR UPDATE is LockRows -> Nested Loop -> Seq Scan on t -> Foreign Scan on Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a = $1 AND ft2.b = $2 If an EPQ recheck is invoked by the A's UPDATE, just recomputing the original join tuple from the whole-row image that you proposed would output an incorrect result in the EQP recheck since the value a in the updated version of a to-be-joined tuple in t would no longer match the value ft1.a extracted from the whole-row image if the A's UPDATE has committed successfully. So I think we would need to rejoin the tuples populated from the whole-row images for the baserels ft1 and ft2, by executing the secondary plan with the new parameter values for a and b. No. You just need to populate fdw_recheck_quals correctly, same as for the scan case. Yeah, I think we can probably do that for the case where a pushed-down join clause is an inner-join one, but I'm not sure that we can do that for the case where that clause is an outer-join one. Maybe I'm missing something, though. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> On Mon, Oct 19, 2015 at 12:17 AM, Kouhei Kaigai wrote: > > 1. Fetch every EPQ slot of base relations involved in this join. > >In case of ForeignScan, all the required tuples of base relations > >should be filled because it is preliminary fetched by whole-row var > >if earlier row-locking, or by RefetchForeignRow if later row-locking. > >In case of CustomScan, it can call ExecProcNode() to generate the > >first tuple even if it does not exists. > >Anyway, I assume all the component tuples of this join can be fetched > >using existing EPQ slot because they are owned by base relations. > > > > 2. The recheck callback fills up ss_ScanTupleSlot according to the > >fdw_scan_tlist or custom_scan_tlist. The callback knows the best way > >to reconstruct the joined tuple from the base relations' tuple fetched > >on the step-1. > >For example, if joined tuple is consists of (t1.a, t1.b, t2.x, t3.s), > >the callback picks up 't1.a' and 't1.b' from the tuple fetched from > >the EPQ slot of t1, then put these values onto the 1st and 2nd slot. > >Also, it picks up 't2.x' from the tuple fetched from the EPQ slot of > >t2, then put this value onto the 3rd slot. Same as above for 't3'. > >At this point, ss_ScanTupleSlot gets filled up by the expected fields > >as if join clauses are satisfied. > > > > 3. The recheck callback also checks qualifiers of base relations that > >are pushed down. Because expression nodes kept in fds_exprs or > >custom_exprs are initialized to reference ss_ScanTupleSlot at setrefs.c, > >it is more reasonable to run ExecQual after the step-2. > >If one of the qualifiers of base relation was evaluated as false, > >the recheck callback returns an empty slot. > > > > 4. The recheck callback also checks join-clauses to join underlying > >base relations. Due to same reason at step-3, it is more reasonable > >to execute ExecQual after the step-2. > >If one of the join-clauses was evaluated as false, the recheck returns > >an empty slot. > >Elsewhere, it returns ss_ScanTupleSlot, then ExecScan will process > >any further jobs. > > Hmm, I guess this would work. But it still feels unnatural to me. It > feels like we haven't really pushed down the join. It's pushed down > except when there's an EPQ check, and then it's not. So we need a > whole alternate plan tree. With my proposal, we don't need that. > Even if we fetch whole-row of both side, join pushdown is exactly working because we can receive less number of rows than local join + 2 of foreign- scan. (If planner works well, we can expect join-path that increases number of rows shall be dropped.) One downside of my proposition is growth of width for individual rows. It is a trade-off situation. The above approach takes no changes for existing EPQ infrastructure, thus, its implementation design is clear. On the other hands, your approach will reduce traffic over the network, however, it is still unclear how we integrate scanrelid==0 with EPQ infrastructure. On the other hands, in case of custom-scan that takes underlying local scan-nodes, thus, any kind of ROW_MARK_* except for ROW_MARK_COPY will happen. I think width of the joined tuples are relatively minor issue than FDW cases. However, we cannot expect the fetched rows are protected by early row-locking mechanism, so probability of re-fetching rows and reconstruction of joined-tuple has relatively higher priority. > There is also some possible loss of efficiency with this approach. > Suppose that we have two tables ft1 and ft2 which are being joined, > and we push down the join. They are being joined on an integer > column, and the join needs to select several other columns as well. > However, ft1 and ft2 are very wide tables that also contain some text > columns. The query is like this: > > SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2 > ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON > localtab.q = ft1.q; > > If we refetch each row individually, we will need a wholerow image of > ft1 and ft2 that includes all columns, or at least f1.huge and > f2.huge2. If we just fetch a wholerow image of the join output, we > can exclude those. The only thing we need to recheck is that it's > still the case that localtab.q = ft1.q (because the value of > localtab.q might have changed). > Isn't it possible to distinguish whole-var reference required by locking mechanism, from the ones required by users? (Does resjunk=true give us a hint?) In case when whole-var reference is required by system internal, it seems to me harmless to put dummy NULLs on unreferenced columns. Is it a feasible idea? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-column distinctness.
Hello Kyotaro-san, On 09/11/2015 06:58 PM, Tomas Vondra wrote: > Maybe the best solution is to abandon the ALTER TABLE approach entirely, and instead invent a new set of commands CREATE STATISTICS DROP STATISTICS (ALTER STATISTICS seems a bit excessive at this point). Another thing is that perhaps we should add names for statistics, just like we do for constraints, for example. Otherwise the DROP STATISTICS handling is rather awkward - for example if the user creates stats twice by mistake, he's unable to drop just one of them. Do you think this modified syntax makes sense? I'll have time to hack on this over the next few days. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why no CONSTANT for row variables in plpgsql?
Jim Nasby writes: > What did seem odd is that while processing the DECLARE section there > were plpgsql datums for tt.a and tt.b. I would have expected the > assignment to produce a row datum of type tt. Yeah, that's the thing that's weird about plpgsql's ROW datums. What the row datum mechanism is actually good for IMO is representing multiple targets for FOR and INTO constructs, ie SELECT ... INTO a,b,c; If you look at the representation of INTO, it only allows one target datum, and the reason that's OK is it uses a row datum for cases like this. The row member datums are just the scalar variables a,b,c, which can also be accessed directly. IMO, we ought to get rid of the use of that representation for composite-type variables and use the RECORD code paths for them, whether they are declared as type record or as named composite types. That would probably make it easier to handle this case, and it'd definitely make it easier to deal with some other weak spots like ALTER TYPE changes to composite types. However, last time I proposed that, it was shot down on the grounds that it might hurt performance in some cases. (Which is likely true, although that argument ignores the fact that other cases might get better.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ROWS FROM(): A Foolish (In)Consistency?
On 10/19/15 1:07 PM, David Fetter wrote: What I'd like to do is lift the restriction on ROWS FROM(), which currently requires that the stuff inside the parentheses set-returning functions, so constructs something like the following would actually work: SELECT * FROM ROWS FROM ( (VALUES (...), ..., (...)), (SELECT ... ), (INSERT ... RETURNING ... ), my_srf() ) AS t(...) would actually work. There's been a few places where I would have found that handy. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL function to report log message
On 10/19/15 1:09 AM, Pavel Stehule wrote: What I was trying to say is that if the argument to a USING option is NULL then RAISE should skip over it, as if it hadn't been applied at all. Similar to how the code currently tests for \0. I understand, but I don't prefer this behave. The NULL is strange value and should be signalized. So instead of raising the message we wanted, we throw a completely different exception? How does that make sense? It is partially wrong because we handle all fields same. It has sense for "message" fields, and has not sense for other fields. In this case the text "NULL" will be better. I fail to see how doing HINT: NULL is much better than just not raising a HINT at all... More to the point, if RAISE operated this way then it would be trivial to create a fully functional plpgsql wrapper around it. I have a different opinion - better to have propossed function in core. What I know, the NULL is not use in Postgres as "ignore value", and I am thinking, it is good idea. Normally I'd agree, but in this case I think it's inline with what the C code is already doing by testing for \0. I suppose if we get the function it's not that bad since at least we get the functionality, so I'll stop arguing it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why no CONSTANT for row variables in plpgsql?
On 10/19/15 5:16 PM, Jim Nasby wrote: Yeah, was hoping someone knew offhand why this was a problem. Guess I'll rip the checks out and see what explodes. ... and what blows up is exec_eval_datum(): case PLPGSQL_DTYPE_ROW: { PLpgSQL_row *row = (PLpgSQL_row *) datum; HeapTuple tup; if (!row->rowtupdesc)/* should not happen */ elog(ERROR, "row variable has no tupdesc"); /* Make sure we have a valid type/typmod setting */ BlessTupleDesc(row->rowtupdesc); oldcontext = MemoryContextSwitchTo(estate->eval_econtext->ecxt_per_tuple_memory); tup = make_tuple_from_row(estate, row, row->rowtupdesc); if (tup == NULL)/* should not happen */ elog(ERROR, "row not compatible with its own tupdesc"); running this: create type tt as (a int, b int); do $$ declare c CONSTANT tt := '(1,2)'::tt; begin raise notice 'c = %', c; end $$; ERROR: row not compatible with its own tupdesc CONTEXT: PL/pgSQL function inline_code_block line 5 at RAISE STATEMENT: do $$ declare c CONSTANT tt := '(1,2)'::tt; begin raise notice 'c = %', c; end $$; ERROR: row not compatible with its own tupdesc CONTEXT: PL/pgSQL function inline_code_block line 5 at RAISE row.tupledesc looked normal to me. What did seem odd is that while processing the DECLARE section there were plpgsql datums for tt.a and tt.b. I would have expected the assignment to produce a row datum of type tt. When exec_stmt_block is finally called, the initialization for loop initializes tt.a and tt.b, but does nothing with c. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 841a8d6..01b3c10 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -500,34 +500,16 @@ decl_statement: decl_varname decl_const decl_datatype decl_collate decl_notnull $3, true); if ($2) { - if (var->dtype == PLPGSQL_DTYPE_VAR) ((PLpgSQL_var *) var)->isconst = $2; - else - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("row or record variable cannot be CONSTANT"), - parser_errposition(@2))); } if ($5) { - if (var->dtype == PLPGSQL_DTYPE_VAR) ((PLpgSQL_var *) var)->notnull = $5; - else - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("row or record variable cannot be NOT NULL"), - parser_errposition(@4))); } if ($6 != NULL) { - if (var->dtype == PLPGSQL_DTYPE_VAR) ((PLpgSQL_var *) var)->default_val = $6; - else - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("default value for row or record variable is not supported"), -
Re: [HACKERS] SuperUser check in pg_stat_statements
On Mon, Oct 19, 2015 at 3:12 PM, Jim Nasby wrote: > On 10/19/15 3:48 PM, rajan wrote: > >> Thanks Stephen and Shulgin for your response. >> >> Will go through the patch and will try to solve my problem using that. >> >> My scenario is that i need to have an user who cannot be a super user but >> a >> monitor user, who will be able to see all the queries executed by all >> users. >> > > You can set that up today by defining a view on top of pg_stat_statements > (or maybe it needs a SECDEF SRF... been a while since I've done it). You can solve this using a security definer method created by a superuser, see https://gist.github.com/lfittl/9ee78ac200e4e7ebe33d for a full example. -- Lukas Fittl Skype: lfittl Phone: +1 415 321 0630
Re: [HACKERS] Why no CONSTANT for row variables in plpgsql?
On 10/18/15 10:16 PM, Tom Lane wrote: Jim Nasby writes: Is there a particular reason why row and record variables can't be CONSTANT in plpgsql? Well, you can't usefully do anything with such a variable unless it can be initialized, which isn't currently supported either: regression=# do $$ declare x int8_tbl := row(1,2); begin end $$; ERROR: default value for row or record variable is not supported LINE 1: do $$ declare x int8_tbl := row(1,2); begin end $$; Yeah, I assumed the two were related. We also don't allow NOT NULL. This is all checked in the production in pl_gram.y, but there's nothing indicating why this is the case. :/ I have a vague recollection of having looked at this a few years ago and realizing it wasn't quite as trivial as one could wish. Don't remember why, though. In any case, I'm sure it's fixable if someone wants to put in enough effort. Yeah, was hoping someone knew offhand why this was a problem. Guess I'll rip the checks out and see what explodes. :) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SuperUser check in pg_stat_statements
On 10/19/15 3:48 PM, rajan wrote: Thanks Stephen and Shulgin for your response. Will go through the patch and will try to solve my problem using that. My scenario is that i need to have an user who cannot be a super user but a monitor user, who will be able to see all the queries executed by all users. You can set that up today by defining a view on top of pg_stat_statements (or maybe it needs a SECDEF SRF... been a while since I've done it). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On Mon, Oct 19, 2015 at 12:17 AM, Kouhei Kaigai wrote: > 1. Fetch every EPQ slot of base relations involved in this join. >In case of ForeignScan, all the required tuples of base relations >should be filled because it is preliminary fetched by whole-row var >if earlier row-locking, or by RefetchForeignRow if later row-locking. >In case of CustomScan, it can call ExecProcNode() to generate the >first tuple even if it does not exists. >Anyway, I assume all the component tuples of this join can be fetched >using existing EPQ slot because they are owned by base relations. > > 2. The recheck callback fills up ss_ScanTupleSlot according to the >fdw_scan_tlist or custom_scan_tlist. The callback knows the best way >to reconstruct the joined tuple from the base relations' tuple fetched >on the step-1. >For example, if joined tuple is consists of (t1.a, t1.b, t2.x, t3.s), >the callback picks up 't1.a' and 't1.b' from the tuple fetched from >the EPQ slot of t1, then put these values onto the 1st and 2nd slot. >Also, it picks up 't2.x' from the tuple fetched from the EPQ slot of >t2, then put this value onto the 3rd slot. Same as above for 't3'. >At this point, ss_ScanTupleSlot gets filled up by the expected fields >as if join clauses are satisfied. > > 3. The recheck callback also checks qualifiers of base relations that >are pushed down. Because expression nodes kept in fds_exprs or >custom_exprs are initialized to reference ss_ScanTupleSlot at setrefs.c, >it is more reasonable to run ExecQual after the step-2. >If one of the qualifiers of base relation was evaluated as false, >the recheck callback returns an empty slot. > > 4. The recheck callback also checks join-clauses to join underlying >base relations. Due to same reason at step-3, it is more reasonable >to execute ExecQual after the step-2. >If one of the join-clauses was evaluated as false, the recheck returns >an empty slot. >Elsewhere, it returns ss_ScanTupleSlot, then ExecScan will process >any further jobs. Hmm, I guess this would work. But it still feels unnatural to me. It feels like we haven't really pushed down the join. It's pushed down except when there's an EPQ check, and then it's not. So we need a whole alternate plan tree. With my proposal, we don't need that. There is also some possible loss of efficiency with this approach. Suppose that we have two tables ft1 and ft2 which are being joined, and we push down the join. They are being joined on an integer column, and the join needs to select several other columns as well. However, ft1 and ft2 are very wide tables that also contain some text columns. The query is like this: SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2 ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON localtab.q = ft1.q; If we refetch each row individually, we will need a wholerow image of ft1 and ft2 that includes all columns, or at least f1.huge and f2.huge2. If we just fetch a wholerow image of the join output, we can exclude those. The only thing we need to recheck is that it's still the case that localtab.q = ft1.q (because the value of localtab.q might have changed). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SuperUser check in pg_stat_statements
Thanks Stephen and Shulgin for your response. Will go through the patch and will try to solve my problem using that. My scenario is that i need to have an user who cannot be a super user but a monitor user, who will be able to see all the queries executed by all users. -- View this message in context: http://postgresql.nabble.com/SuperUser-check-in-pg-stat-statements-tp5870589p5870639.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita wrote: > As Tom mentioned, just recomputing the original join tuple is not good > enough. We would need to rejoin the test tuples for the baserels even if > ROW_MARK_COPY is in use. Consider: > > A=# BEGIN; > A=# UPDATE t SET a = a + 1 WHERE b = 1; > B=# SELECT * from t, ft1, ft2 > WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; > A=# COMMIT; > > where the plan for the SELECT FOR UPDATE is > > LockRows > -> Nested Loop >-> Seq Scan on t >-> Foreign Scan on > Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a > = $1 AND ft2.b = $2 > > If an EPQ recheck is invoked by the A's UPDATE, just recomputing the > original join tuple from the whole-row image that you proposed would output > an incorrect result in the EQP recheck since the value a in the updated > version of a to-be-joined tuple in t would no longer match the value ft1.a > extracted from the whole-row image if the A's UPDATE has committed > successfully. So I think we would need to rejoin the tuples populated from > the whole-row images for the baserels ft1 and ft2, by executing the > secondary plan with the new parameter values for a and b. No. You just need to populate fdw_recheck_quals correctly, same as for the scan case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpoint throttling issues
Fabien COELHO wrote: > >4) It's a bit dubious to only pgstat_send_bgwriter() when on schedule. > > No opinion! My guess here, without looking, is that this was based on the idea of "oops, we're late here for the checkpoint, let's do as less work as possible to avoid getting even later", and thus skip some "unnecessary" tasks. I would vote for fixing this (and the SIGHUP issue too). I'm not sure how bad a behavioral change this is; IMO these fixes should be backpatched all the way back. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpoint throttling issues
preexisting issues: 1) The progress passed to CheckpointWriteDelay() will often be wrong - it's calculated as num_written / num_to_write, but num_written is only incremented if the buffer hasn't since independently been written out. That's bad because it mean's we'll think we're further and further behind if there's independent writeout activity. Simple enough to fix, we gotta split num_written into num_written (for stats purposes) and num_processed (for progress). This is pretty much a bug, but I'm a slightly worried about backpatching a fix because it can have a rather noticeable behavioural impact. I noticed this discrepancy, but decided agains doing anything about it, because it meant more arguing and explaning. In the pgbench run I used the case does not seem to arise often because most I/O is performed through the checkpointer, so it does not matter much. Another discrepancy is that there are two progress computations, one against time and one against wal buffer, and when one is used and why it is better than the other is not too clear to me. 2) CheckpointWriteDelay()'s handling of config file changes et al is pretty weird: The config is reloaded during a checkpoing iff it's not an immediate checkpoint and we're on schedule. I see very little justification for having the got_SIGHUP block inside that if block. No opinion. 3) The static pg_usleep(10L); in CheckpointWriteDelay() is a bad idea. On a system with low write activity (< 10 writes sec) this will lead to checkpoints finishing too early as there'll always be around ~10 writes a second. On slow IO, say a sdcard, that's bad. On system with a very high write throughput (say 1k+ buffers/sec) the unconditional 100ms sleep while on schedule will mean we're far behind after the sleep. This leads to rather bursty IO, and makes it more likely to run into dirty buffer limits and such. Just reducing the sleep from 100ms to 10ms leads to significant latency improvements. On pgbench rate limited to 5k tps: before: number of transactions skipped: 70352 (1.408 %) number of transactions above the 100.0 ms latency limit: 2817 (0.056 %) latency average: 5.266 ms latency stddev: 11.723 ms rate limit schedule lag: avg 3.138 (max 0.000) ms after: number of transactions skipped: 41696 (0.834 %) number of transactions above the 100.0 ms latency limit: 1736 (0.035 %) latency average: 4.929 ms latency stddev: 8.929 ms rate limit schedule lag: avg 2.835 (max 0.000) ms Do these figures include sorting & flushing ? I think the sleep time should be computed adaptively based on the number of buffers remaining and the remaining time. There's probably better formulations, but that seems like an easy enough improvement and considerably better than now. One this run. If there are few I/Os then the 1 0ms will result in more random/discontinuous I/Os, because there would be only a few writes in each bucket, so I'm not sure whether this would be an unconditionnal improvement. I would be interested to have figures *with* sorting and flushing on and several kind of run. 4) It's a bit dubious to only pgstat_send_bgwriter() when on schedule. No opinion! -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
Hello Andres, Here is a v13, which is just a rebase after 1aba62ec. I'm working on this patch, to get it into a state I think it'd be commitable. I'll review it carefully. Also, if you can include some performance feature it would help, even if I'll do some more runs. In my performance testing it showed that calling PerformFileFlush() only at segment boundaries and in CheckpointWriteDelay() can lead to rather spikey IO - not that surprisingly. The sync in CheckpointWriteDelay() is problematic because it only is triggered while on schedule, and not when behind. When behind, the PerformFileFlush should be called on segment boundaries. The idea was not to go to sleep without flushing, and to do it as little as possible. My testing seems to show that just adding a limit of 32 buffers to FileAsynchronousFlush() leads to markedly better results. Hmmm. 32 buffers means 256 KB, which is quite small. Not sure what a good "limit" would be. It could depend whether pages are close or not. I wonder if mmap() && msync(MS_ASYNC) isn't a better replacement for sync_file_range(SYNC_FILE_RANGE_WRITE) than posix_fadvise(DONTNEED). It might even be possible to later approximate that on windows using FlushViewOfFile(). I'm not sure that mmap/msync can be used for this purpose, because there is no real control it seems about where the file is mmapped. As far as I can see the while (nb_spaces != 0)/NextBufferToWrite() logic doesn't work correctly if tablespaces aren't actually sorted. I'm actually inclined to fix this by simply removing the flag to enable/disable sorting. I do no think that there is a significant downside to having sort always on, but showing it requires to be able to test, so to have a guc. The point of the guc is to demonstrate that the feature is harmless:-) Having defined(HAVE_SYNC_FILE_RANGE) || defined(HAVE_POSIX_FADVISE) in so many places looks ugly, I want to push that to the underlying functions. If we add a different flushing approach we shouldn't have to touch several places that don't actually really care. I agree that it is pretty ugly, but I do not think that you can remove them all. You need at least one for checking the guc and one for enabling the feature. Maybe their number could be reduced if the functions are switched to do-nothing stubs which are called nevertheless, but I was not keen on letting unused code when there is no sync_file_range nor posix_fadvise. I've replaced the NextBufferToWrite() logic with a binaryheap.h heap - seems to work well, with a bit less code actually. Hmmm. I'll check. I'm still unconvinced that using a tree for a 2-3 element set in most case is an improvement. I'll post this after some more cleanup & testing. I'll have a look when it is ready. I've also noticed that sleeping logic in CheckpointWriteDelay() isn't particularly good. In high throughput workloads the 100ms sleep is too long, leading to bursty IO behaviour. If 1k+ buffers a written out a second 100ms is a rather long sleep. For another that we only sleep 100ms when the write rate is low makes the checkpoint finish rather quickly - on a slow disk (say microsd) that can cause unneccesary slowdowns for concurrent activity. ISTM we should calculate the sleep time in a better way. I also noted this point, but I'm not sure how to have a better approach, so I let it as it is. I tried 50 ms & 200 ms on some runs, without significant effect on performance for the test I ran then. The point of having not too small a value is that it provide some significant work to the IO subsystem without overflowing it. On average it does not matter. I'm unsure how it would interact with flushing. So I decided not to do anything about it. Maybe it should be a guc, but I would not know how to choose it. The SIGHUP behaviour is also weird. Anyway, this probably belongs on a new thread. Probably. I did not try to look at that. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL function to report log message
Hi 2015-10-13 22:01 GMT+02:00 Robert Haas : > This patch is marked as Ready for Committer in the CommitFest > application. Here is my attempt to summarize the votes upthread: > > Tom Lane: plpgsql RAISE is sufficient; we don't need this. > Pavel Stehule: could be replaced by custom function, but not against it. > Robert Haas: plpgsql RAISE is sufficient; we don't need this. > Jim Nasby: A generic wrapper around RAISE is not trivial, so +1. > Andres Freund: I've written this function many times, so let's have it in > core. > David Fetter: I've written this function like 20 times, we should have it. > > I'm only -0 on this patch, so I won't yell and scream if some other > committer is prepared to step up and get this committed, but I'm not > excited about doing it myself on the strength of a weak consensus in > which I'm not a participant. Any takers? I recommend that we allow > this patch 30 days to attract an interested committer, and, if nobody > volunteers in that time, that we mark it Rejected for lack of > interest. > I am changing opinion little bit - the RAISE statement in plpgsql is really static for some purposes. The proposed function is much more dynamic due using VARIADIC params. It isn't possible with RAISE statement without some possible difficult modifications (difficult to find good consensus). The customized constructor for SPIError can do this work too, but it is not effective install and to use plpythonu for one functionality. More - proposed function is pretty simple without side effects - so maintenance of this function is not high. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] pg_restore cancel TODO
On 10/19/2015 09:47 AM, Jeff Janes wrote: On Mon, Oct 19, 2015 at 9:37 AM, Bruce Momjian mailto:br...@momjian.us>> wrote: Sorry, I don't know how I managed to screw this up so much. pg_restore, not pg_upgrade. I've never looked into pg_restore much until recently, so my fingers just autocomplete to the one I'm more used to. Let me go fix the TODO page. (I'm pretty sure pg_upgrade terminates itself very early on upon missing extensions) Yes it does. jD Jeff. -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. New rule for social situations: "If you think to yourself not even JD would say this..." Stop and shut your mouth. It's going to be bad. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ROWS FROM(): A Foolish (In)Consistency?
On Mon, Oct 19, 2015 at 10:24:37AM -0700, David Fetter wrote: > Folks, > > As I was learning how best to add native weighted statistics, coming > soon, I noticed that our ROWS FROM() constructor takes only > set-returning functions, gluing the outputs together side by side > without a join condition of any kind. This is a handy capability, > which I don't find elsewhere in our code, modulo horrible things > involving WITH ORDINALITY and FULL JOIN. > > Did I miss something? > > If not, would it make sense to allow every set-returning construct > inside ROWS FROM(), rather than just some of them? Based on off-list feedback, this proposal is not as clear as I would have liked. Here's what we get with SRFs: SELECT * FROM ROWS FROM (generate_series(1,5,2),generate_series(4,1,-1)) AS t(i,j); i | j ---+--- 1 | 4 3 | 3 5 | 2 | 1 (4 rows) Here's a similar construct, but not really, from LATERAL: SELECT * FROM generate_series(1,5,2) AS s(i) FULL JOIN LATERAL generate_series(4,1,-1) AS t(j) ON s.i=t.j; i | j ---+--- 1 | 1 | 2 3 | 3 | 4 5 | (5 rows) What I'd like to do is lift the restriction on ROWS FROM(), which currently requires that the stuff inside the parentheses set-returning functions, so constructs something like the following would actually work: SELECT * FROM ROWS FROM ( (VALUES (...), ..., (...)), (SELECT ... ), (INSERT ... RETURNING ... ), my_srf() ) AS t(...) would actually work. What say? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH v3] GSSAPI encryption support
Stephen Frost writes: > As for this patch, the reason I've not been as involved (beyond being > ridiculously busy) is that Michael's environment, which at least appears > perfectly reasonable (and works with PG unpatched) isn't working. If we > can get that working (and I've not looked at what's happening, so I have > no idea how easy or hard that would be), then I'd be a lot more excited > to spend time doing review of the patch. I would also really like to see this fixed. Unfortunately, I haven't been able to replicate; I followed Michael's (very nicely written) writeup and didn't see the issues that Michael did. The only thing I know about the problem is that it logs: > psql: lost synchronization with server: got message type "S", length 22 which unfortunately could be a great many things. I've said this a couple times now, but I really do need more information - a traffic dump, a list of commands that were run, etc.; unfortunately, the surface here is pretty large, and while I totally am willing to believe there are bugs in the code I've written, I do not yet see them. signature.asc Description: PGP signature
[HACKERS] ROWS FROM(): A Foolish (In)Consistency?
Folks, As I was learning how best to add native weighted statistics, coming soon, I noticed that our ROWS FROM() constructor takes only set-returning functions, gluing the outputs together side by side without a join condition of any kind. This is a handy capability, which I don't find elsewhere in our code, modulo horrible things involving WITH ORDINALITY and FULL JOIN. Did I miss something? If not, would it make sense to allow every set-returning construct inside ROWS FROM(), rather than just some of them? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore cancel TODO
On Mon, Oct 19, 2015 at 9:37 AM, Bruce Momjian wrote: > On Mon, Oct 19, 2015 at 09:26:21AM -0700, Jeff Janes wrote: > > It seems like gdb eats signals that you send a process while it is being > > debugged, so it is hard to figure out what is going on. From strace, it > looks > > like the children do receive a signal but either ignore it, or set a > flag and > > then ignore that. > > > > It doesn't continue to load the entire dump file, it exits once they > complete > > the current assignment and ask the parent for more work. > > > > Could just be a matter of adding the local equivalent of > CHECK_FOR_INTERRUPTS > > in the part of the code that spools COPY data to the backends? I'm not > sure > > what would happen if it were in the index/constraint building phase, > I've never > > let it get that far when it reported errors early on. > > > > (This is linux, sorry for not making that clear) > > Well, we are not running COPY in pg_upgrade, just the DDL commands. > Index creation is on empty tables, so it should be very quick. What > should basically happen is that the pg_restore child processes should > exit as forked children, and then the backends for these pg_restore > proceses should then exit. My guess is that this problem is not > pg_upgrade-specific as there is no signal control in pg_upgrade --- you > are just getting the defaults. > > (Updated TODO to mention Linux.) > Sorry, I don't know how I managed to screw this up so much. pg_restore, not pg_upgrade. I've never looked into pg_restore much until recently, so my fingers just autocomplete to the one I'm more used to. Let me go fix the TODO page. (I'm pretty sure pg_upgrade terminates itself very early on upon missing extensions) Jeff.
Re: [HACKERS] pg_restore cancel TODO
On Mon, Oct 19, 2015 at 09:26:21AM -0700, Jeff Janes wrote: > It seems like gdb eats signals that you send a process while it is being > debugged, so it is hard to figure out what is going on. From strace, it looks > like the children do receive a signal but either ignore it, or set a flag and > then ignore that. > > It doesn't continue to load the entire dump file, it exits once they complete > the current assignment and ask the parent for more work. > > Could just be a matter of adding the local equivalent of CHECK_FOR_INTERRUPTS > in the part of the code that spools COPY data to the backends? I'm not sure > what would happen if it were in the index/constraint building phase, I've > never > let it get that far when it reported errors early on. > > (This is linux, sorry for not making that clear) Well, we are not running COPY in pg_upgrade, just the DDL commands. Index creation is on empty tables, so it should be very quick. What should basically happen is that the pg_restore child processes should exit as forked children, and then the backends for these pg_restore proceses should then exit. My guess is that this problem is not pg_upgrade-specific as there is no signal control in pg_upgrade --- you are just getting the defaults. (Updated TODO to mention Linux.) -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore cancel TODO
On Mon, Oct 19, 2015 at 8:28 AM, Tom Lane wrote: > Alvaro Herrera writes: > > Jeff Janes wrote: > >> I've added the TODO item: > >> > >> When pg_upgrade -j ... is interrupted (for example, ctrl-C from the > >> keyboard) make it cancel the children processes. > >> > >> The context where this arises is that I want to populate data into a new > >> installation compiled with a patch under review, but immediately get > error > >> messages indicating I forgot to install a required extension. I hit > ctrl-C > >> so I can fix the problem, but it keeps running anyway. > > > This looks more like a bug to me than a To-do item. > > Why doesn't the control-C kill all the child processes automatically? > I'd have expected it to ... > It seems like gdb eats signals that you send a process while it is being debugged, so it is hard to figure out what is going on. From strace, it looks like the children do receive a signal but either ignore it, or set a flag and then ignore that. It doesn't continue to load the entire dump file, it exits once they complete the current assignment and ask the parent for more work. Could just be a matter of adding the local equivalent of CHECK_FOR_INTERRUPTS in the part of the code that spools COPY data to the backends? I'm not sure what would happen if it were in the index/constraint building phase, I've never let it get that far when it reported errors early on. (This is linux, sorry for not making that clear) Cheers, Jeff
Re: [HACKERS] SuperUser check in pg_stat_statements
Rajan, * rajan (vgmon...@gmail.com) wrote: > When monitoring using pg_stat_satements I see that postgres by default > conceals queries executed by other users from the user who is selecting the > pg_stat_statements view. > > I have edited the pg_stat_statements.c by disabling the superuser check > function so that all queries will be visible to all users. > > Can this be posted as a patch to postgresql? We don't want that to be generally viewable but rather something where an administrator can control who can see it. The current proposal for that is to have a set of default roles, one of which will have this ability. The thread on that topic starts here: http://www.postgresql.org/message-id/flat/20150508042928.gp30...@tamriel.snowman.net#20150508042928.gp30...@tamriel.snowman.net With the latest patch here: http://www.postgresql.org/message-id/2015093020.gm3...@tamriel.snowman.net Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] a raft of parallelism-related bug fixes
On Sat, Oct 17, 2015 at 9:16 PM, Andrew Dunstan wrote: > If all that is required is a #define, like CLOBBER_CACHE_ALWAYS, then no > special buildfarm support is required - you would just add that to the > animal's config file, more or less like this: > > config_env => > { > CPPFLAGS => '-DGRATUITOUSLY_PARALLEL', > }, > > I try to make things easy :-) Wow, that's great. So, I'll try to rework the test code I posted previously into something less hacky, and eventually add a #define like this so we can run it on the buildfarm. There's a few other things that need to get done before that really makes sense - like getting the rest of the bug fix patches committed - otherwise any buildfarm critters we add will just be permanently red. Thanks to Noah and Stephen for your replies also - it is good to hear that if I spend the time to make this committable, somebody will use it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dangling Client Backend Process
On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera wrote: > Andres Freund wrote: >> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote: >> > If I recall correctly, he concerned about killing the backends >> > running transactions which could be saved. I have a sympathy with >> > the opinion. >> >> I still don't. Leaving backends alive after postmaster has died prevents >> the auto-restart mechanism to from working from there on. Which means >> that we'll potentially continue happily after another backend has >> PANICed and potentially corrupted shared memory. Which isn't all that >> unlikely if postmaster isn't around anymore. > > I agree. When postmaster terminates without waiting for all backends to > go away, things are going horribly wrong -- either a DBA has done > something stupid, or the system is misbehaving. As Andres says, if > another backend dies at that point, things are even worse -- the dying > backend could have been holding a critical lwlock, for instance, or it > could have corrupted shared buffers on its way out. It is not sensible > to leave the rest of the backends in the system still trying to run just > because there is no one there to kill them. Yep. +1 for changing this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore cancel TODO
On Mon, Oct 19, 2015 at 08:28:34AM -0700, Tom Lane wrote: > Alvaro Herrera writes: > > Jeff Janes wrote: > >> I've added the TODO item: > >> > >> When pg_upgrade -j ... is interrupted (for example, ctrl-C from the > >> keyboard) make it cancel the children processes. > >> > >> The context where this arises is that I want to populate data into a new > >> installation compiled with a patch under review, but immediately get error > >> messages indicating I forgot to install a required extension. I hit ctrl-C > >> so I can fix the problem, but it keeps running anyway. > > > This looks more like a bug to me than a To-do item. > > Why doesn't the control-C kill all the child processes automatically? > I'd have expected it to ... I don't know. On Unix we use fork() and on Windows we use thread. It is not clear in the TODO list which platform this is for. I don't see any signal control in the pg_upgrade source code. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore cancel TODO
Alvaro Herrera writes: > Jeff Janes wrote: >> I've added the TODO item: >> >> When pg_upgrade -j ... is interrupted (for example, ctrl-C from the >> keyboard) make it cancel the children processes. >> >> The context where this arises is that I want to populate data into a new >> installation compiled with a patch under review, but immediately get error >> messages indicating I forgot to install a required extension. I hit ctrl-C >> so I can fix the problem, but it keeps running anyway. > This looks more like a bug to me than a To-do item. Why doesn't the control-C kill all the child processes automatically? I'd have expected it to ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SuperUser check in pg_stat_statements
On Mon, Oct 19, 2015 at 5:15 PM, rajan wrote: > Hello, > > When monitoring using pg_stat_satements I see that postgres by default > conceals queries executed by other users from the user who is selecting the > pg_stat_statements view. > > I have edited the pg_stat_statements.c by disabling the superuser check > function so that all queries will be visible to all users. > Well, you could see that's by design. What problem are you trying to solve with that? > Can this be posted as a patch to postgresql? > This is not going to be accepted. -- Alex
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Wed, Oct 14, 2015 at 3:16 AM, Josh Berkus wrote: > On 10/13/2015 11:02 AM, Masahiko Sawada wrote: >> I thought that this feature for postgresql should be simple at first >> implementation. >> It would be good even if there are some restriction such as the >> nesting level, the group setting. >> The another new approach that I came up with is, >> * Add new parameter synchronous_replication_method (say s_r_method) >> which can have two names: 'priority', 'quorum' >> * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3') >> is handled using priority. It's same as '[n1,n2,n3]' in dedicated >> laguage. >> * If s_r_method = 'quorum', the value of s_s_names is handled using >> quorum commit, It's same as '(n1,n2,n3)' in dedicated language. > > Well, the first question is: can you implement both of these things for > 9.6, realistically? If you can implement them, then we can argue about > configuration format later. It's even possible that the nature of your > implementation will enforce a particular syntax. > Hi, Attached patch is a rough patch which supports multi sync replication by another approach I sent before. The new GUC parameters are: * synchronous_standby_num, which specifies the number of standby servers using sync rep. (default is 0) * synchronous_replication_method, which specifies replication method; priority or quorum. (default is priority) The behaviour of 'priority' and 'quorum' are same as what we've been discussing. But I write overview of these here again here. [Priority Method] The standby server has each different priority, and the active standby servers having the top N priroity are become sync standby server. If synchronous_standby_names = '*', the all active standby server would be sync standby server. If you want to set up standby like 9.5 or before, you can set synchronous_standby_num = 1. [Quorum Method] The standby servers have same priority 1, and the all the active standby servers will be sync standby server. The master server have to wait for ACK from N sync standby servers at least before COMMIT. If synchronous_standby_names = '*', the all active standby server would be sync standby server. [Use case] This patch can handle the main use case where Michael said; There are 2 data centers, first one has the master and a sync standby, and second one has a set of standbys. We need to be sure that the standby in DC1 acknowledges all the time, and we would only need to wait for one or more of them in DC2. In order to handle this use case, you set these standbys and GUC parameter as follows. * synchronous_standby_names = 'DC1, DC2' * synchronous_standby_num = 2 * synchronous_replication_method = quorum * The name of standby server in DC1 is 'DC1', and the names of two standby servers in DC2 are 'DC2'. [Extensible] By setting same application_name to different standbys, we can set up sync replication with grouping standbys. If we want to set up replication more complexly and flexibility, we could add new syntax for s_s_names (e.g., JSON format or dedicated language), and increase kind of values of synhcronous_replication_method, e.g. s_r_method = 'complex', And this patch doesn't need new parser for GUC parameter. Regards, -- Masahiko Sawada synchronous_replication_method_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SuperUser check in pg_stat_statements
Hello, When monitoring using pg_stat_satements I see that postgres by default conceals queries executed by other users from the user who is selecting the pg_stat_statements view. I have edited the pg_stat_statements.c by disabling the superuser check function so that all queries will be visible to all users. Can this be posted as a patch to postgresql? -- View this message in context: http://postgresql.nabble.com/SuperUser-check-in-pg-stat-statements-tp5870589.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore cancel TODO
On Mon, Oct 19, 2015 at 11:51:00AM -0300, Alvaro Herrera wrote: > Jeff Janes wrote: > > I've added the TODO item: > > > > When pg_upgrade -j ... is interrupted (for example, ctrl-C from the > > keyboard) make it cancel the children processes. > > > > The context where this arises is that I want to populate data into a new > > installation compiled with a patch under review, but immediately get error > > messages indicating I forgot to install a required extension. I hit ctrl-C > > so I can fix the problem, but it keeps running anyway. > > This looks more like a bug to me than a To-do item. Uh, many TODO items are bugs. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Define integer limits independently from the system definitions.
Hi Andres, On Fri, Apr 3, 2015 at 12:45 AM, Andres Freund wrote: > Define integer limits independently from the system definitions. > > In 83ff1618 we defined integer limits iff they're not provided by the > system. That turns out not to be the greatest idea because there's > different ways some datatypes can be represented. E.g. on OSX PG's 64bit > datatype will be a 'long int', but OSX unconditionally uses 'long > long'. That disparity then can lead to warnings, e.g. around printf > formats. > This commit has added PG_INT16_MIN/MAX, however we still rely on SHRT_MIN and SHRT_MAX in some code paths. On all the platforms I looked at (OpenBSD, FreeBSD, MinGW, MSVC, OSX, Linux, Solaris) SHRT_MIN and SHRT_MAX definitions are always the same. Still, shouldn't we replace SHRT_* by their equivalent PG_INT16_* to be completely independent? I am just wondering if you considered that. Regards, -- Michael
Re: [HACKERS] pg_restore cancel TODO
Jeff Janes wrote: > I've added the TODO item: > > When pg_upgrade -j ... is interrupted (for example, ctrl-C from the > keyboard) make it cancel the children processes. > > The context where this arises is that I want to populate data into a new > installation compiled with a patch under review, but immediately get error > messages indicating I forgot to install a required extension. I hit ctrl-C > so I can fix the problem, but it keeps running anyway. This looks more like a bug to me than a To-do item. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve the concurency of vacuum full table and select statement on the same relation
The lock upgrade for vacuum full table tends to cause deadlock with other lock upgrade transaction which is from AccessShareLock to lockmode > AccessShareLock. Tom Lane's concern is that it will cause vacuum full failed after do a lot of work. But If we can always let other transaction failed to break deadlock instead of vacuum full table, how about this lock upgrade solution? For example: we can enlarge the 'DeadlockTimeout' for vacuum full table transaction to avoid deadlock check. Jinyu Zhang regards At 2015-10-16 23:04:51, "Robert Haas" wrote: >On Thu, Oct 15, 2015 at 8:28 PM, Jim Nasby wrote: >> It's just how the authors of pg_repack decided to handle it. It seems pretty >> reasonable, since you probably don't want an errant DDL statement to cause >> the rollback of hours or days of pg_repack work. >> >> Ultimately, I don't think you'll find many people interested in working on >> this, because the whole goal is to never need VACUUM FULL or pg_repack. If >> you're clustering just for the sake of clustering, that has it's own set of >> difficulties that should be addressed. > >I think the topic of online table reorganization is a pretty important >one, actually. That is a need that we have had for a long time, >creates serious operational problems for users, and it's also a need >that is not going away. I think the chances of eliminating that need >completely, even if we rearchitected or heap storage, are nil. > >I think the bigger issue is that it's a very hard problem to solve. >pg_repack is one approach, but I've heard more than one person say >that, as C-3PO said about the asteroid, it may not be entirely stable. > >-- >Robert Haas >EnterpriseDB: http://www.enterprisedb.com >The Enterprise PostgreSQL Company
Re: [HACKERS] PATCH: 9.5 replication origins fix for logical decoding
Hi all Patch revision 3 attached. It's a one-liner, with just the fix, and an explanatory note in the patch header. I'll leave the test changes for now, per Andres's feedback. I'll write up a proposed addition to the replication origin docs that explains the existence of separate origins for each change and for the tx as a whole, and explains how replication origins interacts with transaction commit timestamps. That'll come as a followup; I think it's worth getting the bug fix in now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 8c4285402706329a64a3f180c66a04357a932b3d Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Mon, 19 Oct 2015 21:37:19 +0800 Subject: [PATCH] Send replication origin on commit The transaction replication origin is not set when decoding a commit. Origins for individual replication changes are passed correctly, as is the origin when calling the transaction filter. Fix that, supplying the transaction origin to logical decoding callbacks. Prior to this change the ReorderBufferTXN->origin_id (usually: txn->origin_id) is always InvalidRepNodeOrigin in decoding callbacks other than the origin filter. See the thread http://www.postgresql.org/message-id/CAMsr+YFhBJLp=qfsz3-j+0p1zlke8znxm2otycn20qrmx38...@mail.gmail.com for details. --- src/backend/replication/logical/decode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index c629da3..9f60687 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -450,7 +450,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, { XLogRecPtr origin_lsn = InvalidXLogRecPtr; XLogRecPtr commit_time = InvalidXLogRecPtr; - XLogRecPtr origin_id = InvalidRepOriginId; + XLogRecPtr origin_id = XLogRecGetOrigin(buf->record); int i; if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN) -- 2.1.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore cancel TODO
On Wed, Oct 14, 2015 at 09:34:04AM -0700, Jeff Janes wrote: > I've added the TODO item: > > When pg_upgrade -j ... is interrupted (for example, ctrl-C from the keyboard) > make it cancel the children processes. > > The context where this arises is that I want to populate data into a new > installation compiled with a patch under review, but immediately get error > messages indicating I forgot to install a required extension. I hit ctrl-C so > I can fix the problem, but it keeps running anyway. OK, we will need to look at that at some point. I am not sure we have any mechanism now to close those parallel processes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fs issues on software raid0 (PG_VERSION does not contain valid data)
Hi, On 10/18/2015 08:37 PM, Andres Freund wrote: Hi, On 2015-10-18 20:25:29 +0200, Tomas Vondra wrote: I've been doing a lot of filesystem testing / benchmarking recently, and today I've ran into a really strange issue with ext4 on two SSD devices in a RAID-0 configuration (Linux software raid). All this happens on an ext4 filesystem, created on a sw raid0 manager by kernel 4.0.4. The filesystem is created like this: mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/sda1 /dev/sdb1 mkfs.ext4 -E stride=128,stripe-width=256 /dev/md0 Sounds like http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f Try reproducing with 4.0.5 to verify, that should contain the fix. Yep, that seems to have fixed the issue. I've been actually looking at that fix but concluded that it's unrelated, for some reason ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> -Original Message- > From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp] > Sent: Monday, October 19, 2015 8:52 PM > To: Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI > Cc: pgsql-hackers@postgresql.org; shigeru.han...@gmail.com; > robertmh...@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > I wrote: > >> As Robert mentioned in [1], I think that if we're inside EPQ, > >> pushed-down quals and/or pushed-down joins should be locally rechecked > >> in the same way as other cases such as IndexRecheck. So, I'll propose > >> the updated version of the patch. > > On 2015/10/16 18:48, Kouhei Kaigai wrote: > > You have never answered my question for two months. > > > > I never deny to execute the pushed-down qualifiers locally. > > It is likely the best tactics in most cases. > > But, why you try to enforce all the people a particular manner? > > > > Here are various kind of FDW drivers. How do you guarantee it is > > the best solution for all the people? It is basically impossible. > > (Please google "Probatio diabolica") > > > > You try to add two special purpose fields in ForeignScan; > > fdw_recheck_plan and fdw_recheck_quals. > > It requires FDW drivers to have pushed-down qualifier in a particular > > data format, and also requires FDW drivers to process EPQ recheck by > > alternative local plan, even if a part of FDW drivers can process > > these jobs by its own implementation better. > > > > I've repeatedly pointed out this issue, but never get reasonable > > answer from you. > > > > Again, I also admit alternative plan may be reasonable tactics for > > most of FDW drivers. However, only FDW author can "decide" it is > > the best tactics to handle the task for their module, not us. > > > > I don't think it is a good interface design to enforce people to > > follow a particular implementation manner. It should be discretion > > of the extension. > > I think that if you think so, you should give at least one concrete > example for that. Ideally accompanied by a demo of how that works well. > I previously showed an example situation: http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f801138...@bpxm15gp.gisp.nec.co.jp Then, your response was below: | Thanks for the answer, but I'm not still convinced. | I think the EPQ testing shown in that use-case would probably not | efficient, compared to the core's. What I'm repeatedly talking about is flexibility of the interface, not efficiently. If core backend provide a good enough EPQ recheck routine, extension can call it but decision by its author. Why do you want to prohibit extension to choose its implementation? Also, I introduced the case of PG-Strom in the face-to-face meeting with you. PG-Strom has self CPU-fallback routine to rescue GPU errors. thus, I prefer to reuse this routine for EPQ rechecks, rather than adding alternative local plan support here. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
I wrote: >> As Robert mentioned in [1], I think that if we're inside EPQ, >> pushed-down quals and/or pushed-down joins should be locally rechecked >> in the same way as other cases such as IndexRecheck. So, I'll propose >> the updated version of the patch. On 2015/10/16 18:48, Kouhei Kaigai wrote: > You have never answered my question for two months. > > I never deny to execute the pushed-down qualifiers locally. > It is likely the best tactics in most cases. > But, why you try to enforce all the people a particular manner? > > Here are various kind of FDW drivers. How do you guarantee it is > the best solution for all the people? It is basically impossible. > (Please google "Probatio diabolica") > > You try to add two special purpose fields in ForeignScan; > fdw_recheck_plan and fdw_recheck_quals. > It requires FDW drivers to have pushed-down qualifier in a particular > data format, and also requires FDW drivers to process EPQ recheck by > alternative local plan, even if a part of FDW drivers can process > these jobs by its own implementation better. > > I've repeatedly pointed out this issue, but never get reasonable > answer from you. > > Again, I also admit alternative plan may be reasonable tactics for > most of FDW drivers. However, only FDW author can "decide" it is > the best tactics to handle the task for their module, not us. > > I don't think it is a good interface design to enforce people to > follow a particular implementation manner. It should be discretion > of the extension. I think that if you think so, you should give at least one concrete example for that. Ideally accompanied by a demo of how that works well. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/10/16 19:03, Kouhei Kaigai wrote: *** 48,59 ExecScanFetch(ScanState *node, + /* +* Execute recheck plan and get the next tuple if foreign join. +*/ + if (scanrelid == 0) + { + (*recheckMtd) (node, slot); + return slot; + } Ensure the slot is empty if recheckMtd returned false, as base relation case doing so. Fixed. *** 347,352 ExecScanReScan(ScanState *node) { Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; + if (scanrelid == 0) + return; /* nothing to do */ + Assert(scanrelid > 0); estate->es_epqScanDone[scanrelid - 1] = false; Why nothing to do? Base relations managed by ForeignScan are tracked in fs_relids bitmap. I think the estate->es_epqScanDone flag should be initialized when we do ExecScanReSacn for each of the component ForeignScanState nodes in the local join execution plan state tree. As you introduced a few days before, if ForeignScan has parametalized remote join, EPQ slot contains invalid tuples based on old outer tuple. Maybe my explanation was not enough, but I haven't said such a thing. The problem in that case is that just returning the previously-returned foeign-join tuple would produce an incorrect result if an outer tuple to be joined has changed due to a concurrent transaction, as explained upthread. (I think that the EPQ slots would contain valid tuples.) Attached is an updated version of the patch. Other changes: * remove unnecessary memory-context handling for the foreign-join case in ForeignRecheck * revise code a bit and add a bit more comments Thanks for the comments! Best regards, Etsuro Fujita *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 525,530 fileGetForeignPaths(PlannerInfo *root, --- 525,531 total_cost, NIL, /* no pathkeys */ NULL, /* no outer rel either */ + NULL, /* no alternative path */ coptions)); /* *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 560,565 postgresGetForeignPaths(PlannerInfo *root, --- 560,566 fpinfo->total_cost, NIL, /* no pathkeys */ NULL, /* no outer rel either */ + NULL, /* no alternative path */ NIL); /* no fdw_private list */ add_path(baserel, (Path *) path); *** *** 727,732 postgresGetForeignPaths(PlannerInfo *root, --- 728,734 total_cost, NIL, /* no pathkeys */ param_info->ppi_req_outer, + NULL, /* no alternative path */ NIL); /* no fdw_private list */ add_path(baserel, (Path *) path); } *** a/src/backend/executor/execScan.c --- b/src/backend/executor/execScan.c *** *** 48,59 ExecScanFetch(ScanState *node, * conditions. */ Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; Assert(scanrelid > 0); if (estate->es_epqTupleSet[scanrelid - 1]) { - TupleTableSlot *slot = node->ss_ScanTupleSlot; - /* Return empty slot if we already returned a tuple */ if (estate->es_epqScanDone[scanrelid - 1]) return ExecClearTuple(slot); --- 48,67 * conditions. */ Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; + TupleTableSlot *slot = node->ss_ScanTupleSlot; + + if (scanrelid == 0) + { + /* Execute recheck plan and store result in the slot */ + if (!(*recheckMtd) (node, slot)) + ExecClearTuple(slot); /* would not be returned by scan */ + + return slot; + } Assert(scanrelid > 0); if (estate->es_epqTupleSet[scanrelid - 1]) { /* Return empty slot if we already returned a tuple */ if (estate->es_epqScanDone[scanrelid - 1]) return ExecClearTuple(slot); *** *** 347,352 ExecScanReScan(ScanState *node) --- 355,363 { Index scanrelid = ((Scan *) node->ps.plan)->scanrelid; + if (scanrelid == 0) + return;/* nothing to do */ + Assert(scanrelid > 0); estate->es_epqScanDone[scanrelid - 1] = false; *** a/src/backend/executor/nodeForeignscan.c --- b/src/backend/executor/nodeForeignscan.c *** *** 24,29 --- 24,30 #include "executor/executor.h" #include "executor/nodeForeignscan.h" + #include "executor/tuptable.h" #include "foreign/fdwapi.h" #include "utils/memutils.h" #include "utils/rel.h" *** *** 73,80 ForeignNext(ForeignScanState *node) --- 74,99 static bool ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot) { + Index scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid; ExprContext *econtext; + if (scanrelid == 0) + { + TupleTableSlot *resul
[HACKERS] Checkpoint throttling issues
Hi, working on the checkpoint sorting/flushing patch I noticed a number of preexisting issues: 1) The progress passed to CheckpointWriteDelay() will often be wrong - it's calculated as num_written / num_to_write, but num_written is only incremented if the buffer hasn't since independently been written out. That's bad because it mean's we'll think we're further and further behind if there's independent writeout activity. Simple enough to fix, we gotta split num_written into num_written (for stats purposes) and num_processed (for progress). This is pretty much a bug, but I'm a slightly worried about backpatching a fix because it can have a rather noticeable behavioural impact. 2) CheckpointWriteDelay()'s handling of config file changes et al is pretty weird: The config is reloaded during a checkpoing iff it's not an immediate checkpoint and we're on schedule. I see very little justification for having the got_SIGHUP block inside that if block. 3) The static pg_usleep(10L); in CheckpointWriteDelay() is a bad idea. On a system with low write activity (< 10 writes sec) this will lead to checkpoints finishing too early as there'll always be around ~10 writes a second. On slow IO, say a sdcard, that's bad. On system with a very high write throughput (say 1k+ buffers/sec) the unconditional 100ms sleep while on schedule will mean we're far behind after the sleep. This leads to rather bursty IO, and makes it more likely to run into dirty buffer limits and such. Just reducing the sleep from 100ms to 10ms leads to significant latency improvements. On pgbench rate limited to 5k tps: before: number of transactions skipped: 70352 (1.408 %) number of transactions above the 100.0 ms latency limit: 2817 (0.056 %) latency average: 5.266 ms latency stddev: 11.723 ms rate limit schedule lag: avg 3.138 (max 0.000) ms after: number of transactions skipped: 41696 (0.834 %) number of transactions above the 100.0 ms latency limit: 1736 (0.035 %) latency average: 4.929 ms latency stddev: 8.929 ms rate limit schedule lag: avg 2.835 (max 0.000) ms I think the sleep time should be computed adaptively based on the number of buffers remaining and the remaining time. There's probably better formulations, but that seems like an easy enough improvement and considerably better than now. 4) It's a bit dubious to only pgstat_send_bgwriter() when on schedule. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some questions about the array.
Hello again. I attached simple patch for omitted boundaries in the slice. This will simplify the writing of SQL. Instead: select arr[2:array_upper(arr, 1)]; you can write: select arr[2:]; simple and elegant. Omitted boundaries is prohibited in UPDATE. Thanks. -- YUriy Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml index 4385a09..57614b7 100644 --- a/doc/src/sgml/array.sgml +++ b/doc/src/sgml/array.sgml @@ -257,6 +257,25 @@ SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill'; (1 row) + You can skip the lower-bound or upper-bound + for get first or last element in slice. + + +SELECT schedule[:][:] FROM sal_emp WHERE name = 'Bill'; + +schedule + + {{meeting,lunch},{training,presentation}} +(1 row) + +SELECT schedule[:2][1:] FROM sal_emp WHERE name = 'Bill'; + +schedule + + {{meeting,lunch},{training,presentation}} +(1 row) + + If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 29f058c..6643714 100644 --- a/src/backend/executor/execQual.c +++ b/src/backend/executor/execQual.c @@ -268,10 +268,12 @@ ExecEvalArrayRef(ArrayRefExprState *astate, bool eisnull; ListCell *l; int i = 0, -j = 0; +j = 0, +indexexpr; IntArray upper, lower; int *lIndex; + AnyArrayType *arrays; array_source = ExecEvalExpr(astate->refexpr, econtext, @@ -293,6 +295,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate, foreach(l, astate->refupperindexpr) { ExprState *eltstate = (ExprState *) lfirst(l); + eisnull = false; if (i >= MAXDIM) ereport(ERROR, @@ -300,10 +303,23 @@ ExecEvalArrayRef(ArrayRefExprState *astate, errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", i + 1, MAXDIM))); - upper.indx[i++] = DatumGetInt32(ExecEvalExpr(eltstate, - econtext, - &eisnull, - NULL)); + if (eltstate == NULL && astate->refattrlength <= 0) + { + if (isAssignment) +ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), + errmsg("cannot determine upper index for empty array"))); + arrays = (AnyArrayType *)DatumGetArrayTypeP(array_source); + indexexpr = AARR_LBOUND(arrays)[i] + AARR_DIMS(arrays)[i] - 1; + } + else + indexexpr = DatumGetInt32(ExecEvalExpr(eltstate, + econtext, + &eisnull, + NULL)); + + upper.indx[i++] = indexexpr; + /* If any index expr yields NULL, result is NULL or error */ if (eisnull) { @@ -321,6 +337,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate, foreach(l, astate->reflowerindexpr) { ExprState *eltstate = (ExprState *) lfirst(l); + eisnull = false; if (j >= MAXDIM) ereport(ERROR, @@ -328,10 +345,20 @@ ExecEvalArrayRef(ArrayRefExprState *astate, errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", j + 1, MAXDIM))); - lower.indx[j++] = DatumGetInt32(ExecEvalExpr(eltstate, - econtext, - &eisnull, - NULL)); + if (eltstate == NULL) + { +arrays = (AnyArrayType *)DatumGetArrayTypeP(array_source); +indexexpr = AARR_LBOUND(arrays)[j]; + } + else +indexexpr = DatumGetInt32(ExecEvalExpr(eltstate, + econtext, + &eisnull, + NULL)); + + lower.indx[j++] = indexexpr; + + /* If any index expr yields NULL, result is NULL or error */ if (eisnull) { diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 0b4ab23..6d9cad4 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2415,6 +2415,8 @@ _copyAIndices(const A_Indices *from) COPY_NODE_FIELD(lidx); COPY_NODE_FIELD(uidx); + COPY_SCALAR_FIELD(lidx_default); + COPY_SCALAR_FIELD(uidx_default); return newnode; } diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index aa6e102..e75b448 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2162,6 +2162,8 @@ _equalAIndices(const A_Indices *a, const A_Indices *b) { COMPARE_NODE_FIELD(lidx); COMPARE_NODE_FIELD(uidx); + COMPARE_SCALAR_FIELD(lidx_default); + COMPARE_SCALAR_FIELD(uidx_default); return true; } diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index df7f6e1..6769740 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2756,6 +2756,8 @@ _outA_Indices(StringInfo str, const A_Indices *node) WRITE_NODE_FIELD(lidx); WRITE_NODE_FIELD(uidx); + WRITE_BOOL_FIELD(lidx_default); + WRITE_BOOL_FIELD(uidx_default); } static void diff --git a/src/backend/parser/gr
[HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types
Hi, We cannot to declare variable with referenced type on other composite variable. This limit is probably artificial, because any composite type is any type too in PostgreSQL. The issue: referencing on composite variables doesn't work do $$ declare x int; y x%type; begin end; $$; -- ok do $$ declare x pg_class; y x%type; begin end; $$; -- invalid type name "x%type" do $$ declare x pg_class; y x%rowtype; begin end; $$; -- relation "x" does not exist The %ROWTYPE needs a record in pg_class. Probably we should not to change it. The change can bring a compatibility issues. So there are two possibilities: 1. %TYPE can be used for any kind of variables. This behave will be consistent with polymorphic parameters - we have "anyelement", and we have not "anyrow". 2. introduce new keyword - %RECTYPE .. it can work, but there will be gap between polymorphic parameters. Comments, notices? Regards Pavel
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/10/17 9:58, Robert Haas wrote: But with Etsuro Fujita's patch, and I think what you have proposed has been similar, how are you going to do it? The proposal is to call the recheck method and hope for the best, but what is the recheck method going to do? Where is it going to get the previously-returned tuple? As I explained in a previous email, just returning the previously-returned tuple is not good enough. How will it know if it has already returned it during the lifetime of this EPQ check? Offhand, it looks to me like, at least in some circumstances, you're probably going to return whatever tuple you returned most recently (which has a good chance of being the right one, but not necessarily) over and over again. That's not going to fly. No. Since the local join execution plan is created so that the scan slot for each foreign table involved in the pushed-down join looks at its EPQ slot, I think the plan can return at most one tuple. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane wrote: > > Robert Haas writes: > >> Both you and Etsuro Fujita are proposing to fix this problem by > >> somehow making it the FDW's problem to reconstruct the tuple > >> previously produced by the join from whole-row images of the baserels. > >> But that's not looking back far enough: why are we asking for > >> whole-row images of the baserels when what we really want is a > >> whole-row image of the output of the join? The output of the join is > >> what we need to re-return. > > > > There are multiple components to the requirement though: > > > > 1. Recheck the rows that were in the baserels and possibly fetch updated > > versions of them. (Once we're in EPQ, we want the most recent row > > versions, not what the query snapshot can see.) > > Check. But postgres_fdw, and probably quite a few other FDWs, use > early row locking. So ROW_MARK_COPY is in use and we need not worry > about refetching rows. > > > 2. Apply relevant restriction clauses and see if the updated row versions > > still pass the clauses. > > Check. > > > 3. If so, form a join row and return that. Else return NULL. > > Not check. > > Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw. > There is a local table t. The user does something like UPDATE t SET > ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND The > query planner generates something like: > > Update > -> Join > -> Scan on t > -> Foreign Scan on > > If an EPQ recheck occurs, the only thing that matters is that the > Foreign Scan return the right output row (or possibly now rows, if the > row it would have formed no longer matches the quals). It doesn't > matter how it does this. Let's say the columns actually needed by the > query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b. > Currently, the output of the foreign scan is something like: ft1.a, > ft1.b, ft2.a, ft.b, ft1.*, ft2.*. The EPQ recheck has access to ft1.* > and ft2.*, but it's not straightforward for postgres_fdw to regenerate > the join tuple from that. Maybe the pushed-down was a left join, > maybe it was a right join, maybe it was a full join. So some of the > columns could have gone to NULL. To figure it out, you need to build > a secondary plan tree that mimics the structure of the join you pushed > down, which is kinda hairy. > In case of outer join, do we need to care about join-clause, unlike scan qualifiers? Rows filled-up by NULLs appears when here is no matched tuple on other side. It means any rows in the relation of non-NUllable side are visible regardless of join-clause, even though it may be or may not be matched with the latest rows refetched based on the latest values. Example) remote table: ft1 id | val ---+--- 1 | 'aaa' 2 | 'bbb' 3 | 'ccc' remote table: ft2 id | val ---+--- 2 | 'xxx' 3 | 'yyy' 4 | 'zzz' If remote join query is: SELECT *, ft1.*, ft2.* FROM ft1 LEFT JOIN ft2 ON ft1.id = ft2.id WHERE ft1.id < 3; its expected result is: ft1.id | ft1.val | ft2.id | ft2.val | ft1.* | ft2.* | ---+-++-+-+-+ 1| 'aaa' | null | null |(1,'aaa')| null | 2| 'bbb' | 2| 'xxx' |(2,'bbb')|(2,'xxx')| The non-NULLs side (ft1 in this case) are visible regardless of the join- clause, as long as tuples in ft1 satisfies the scan-qualifier (ft1.id < 3). FDW/CSP knows the type of joins that should be responsible, so it can skip evaluation of join-clauses but apply only scan-qualifiers on base relation's tuple. > Think how much easier your life would be if you hadn't bothered > fetching ft1.* and ft2.*, which aren't so handy in this case, and had > instead made the output of the foreign scan ft1.a, ft1.b, ft2.a, > ft2.b, ROW(ft1.a, ft1.b, ft2.a, ft2.b) -- and that the output of that > ROW() operation was stored in an EPQ slot. Now, you don't need the > secondary plan tree any more. You've got all the data you need right > in your hand. The values inside the ROW() constructor were evaluated > after accounting for the goes-to-NULL effects of any pushed-down > joins. > > This example is of the early row locking case, but I think the story > is about the same if the FDW wants to do late row locking instead. If > there's an EPQ recheck, it could issue individual row re-fetches > against every base table and then re-do all the joins that it pushed > down locally. But it would be faster and cleaner, I think, to send > one query to the remote side that re-fetches all the rows at once, and > whose target list is exactly what we need, rather than whole row > targetlists for each baserel that then have to be rejiggered on our > side. > Which approach is more reasonable? In case of early row locking, FDW ensures all the rows involved in the join is protected by concurrent accesses. So, no need to concern about refetching from the remote side. On the other hands, in case of late row locking, we need to pay a
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/10/17 12:22, Robert Haas wrote: On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane wrote: Robert Haas writes: Both you and Etsuro Fujita are proposing to fix this problem by somehow making it the FDW's problem to reconstruct the tuple previously produced by the join from whole-row images of the baserels. But that's not looking back far enough: why are we asking for whole-row images of the baserels when what we really want is a whole-row image of the output of the join? The output of the join is what we need to re-return. There are multiple components to the requirement though: 3. If so, form a join row and return that. Else return NULL. Not check. Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw. There is a local table t. The user does something like UPDATE t SET ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND The query planner generates something like: Update -> Join -> Scan on t -> Foreign Scan on If an EPQ recheck occurs, the only thing that matters is that the Foreign Scan return the right output row (or possibly now rows, if the row it would have formed no longer matches the quals). It doesn't matter how it does this. Let's say the columns actually needed by the query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b. Currently, the output of the foreign scan is something like: ft1.a, ft1.b, ft2.a, ft.b, ft1.*, ft2.*. The EPQ recheck has access to ft1.* and ft2.*, but it's not straightforward for postgres_fdw to regenerate the join tuple from that. Maybe the pushed-down was a left join, maybe it was a right join, maybe it was a full join. So some of the columns could have gone to NULL. To figure it out, you need to build a secondary plan tree that mimics the structure of the join you pushed down, which is kinda hairy. As Tom mentioned, just recomputing the original join tuple is not good enough. We would need to rejoin the test tuples for the baserels even if ROW_MARK_COPY is in use. Consider: A=# BEGIN; A=# UPDATE t SET a = a + 1 WHERE b = 1; B=# SELECT * from t, ft1, ft2 WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; A=# COMMIT; where the plan for the SELECT FOR UPDATE is LockRows -> Nested Loop -> Seq Scan on t -> Foreign Scan on Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c AND ft1.a = $1 AND ft2.b = $2 If an EPQ recheck is invoked by the A's UPDATE, just recomputing the original join tuple from the whole-row image that you proposed would output an incorrect result in the EQP recheck since the value a in the updated version of a to-be-joined tuple in t would no longer match the value ft1.a extracted from the whole-row image if the A's UPDATE has committed successfully. So I think we would need to rejoin the tuples populated from the whole-row images for the baserels ft1 and ft2, by executing the secondary plan with the new parameter values for a and b. As for the secondary plan, I think we could create the corresponding local join execution path during GetForeignJoinPaths, (1) by looking at the pathlist of the joinrel RelOptInfo, which would have already contained some local join execution paths, as does the patch, or (2) by calling a helper function that creates a local join execution path from given outer/inner paths selected from the pathlists of the outerrel/innerrel RelOptInfos, as proposed be KaiGai-san before. ISTM that the latter would be better, so I plan to propose such a function as part of the postgres_fdw join pushdown patch for 9.6. This example is of the early row locking case, but I think the story is about the same if the FDW wants to do late row locking instead. If there's an EPQ recheck, it could issue individual row re-fetches against every base table and then re-do all the joins that it pushed down locally. But it would be faster and cleaner, I think, to send one query to the remote side that re-fetches all the rows at once, and whose target list is exactly what we need, rather than whole row targetlists for each baserel that then have to be rejiggered on our side. I agree with you on that point. (In fact, I thought that too!) But considering that many FDWs including postgres_fdw use early row locking (ie, ROW_MARK_COPY) currently, I'd like to leave that for future work. I think what Kaigai-san and Etsuro-san are after is trying to find a way to reuse some of the existing EPQ machinery to help with that. This may not be practical, or it may end up being messier than a standalone implementation; but it's not silly on its face to want to reuse some of that code. Yeah, I think we're all in agreement that reusing as much of the EPQ machinery as is sensible is something we should do. We are not in agreement on which parts of it need to be changed or extended. Agreed. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: htt