Re: [HACKERS] CLUSTER FREEZE
On 2013-10-24 17:17:22 -0700, Josh Berkus wrote: On 10/24/2013 04:55 PM, Robert Haas wrote: On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus j...@agliodbs.com wrote: On 10/23/2013 09:58 PM, Amit Kapila wrote: I wonder why anyone would like to freeze during CLUSTER command when they already have separate way (VACUUM FREEZE) to achieve it, do you know or can think of any case where user wants to do it along with Cluster command? If I'm rewriting the table anyway, let's freeze it. Otherwise, you have to write the same pages twice, if both CLUSTER and FREEZE are required. I wonder if we should go so far as to make this the default behavior, instead of just making it an option. +1 from me. Can you think of a reason you *wouldn't* want to freeze? It makes content from the future appear when you start using the relation in a query/session with an older snapshot. Currently CLUSTER is safe against that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander mag...@hagander.net wrote: In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. +1 on removing archive from wal_level. Having both archive and hot_standby for wal_level is confusing, and if I recall correctly hot_standby and archive have been kept as possible settings only to protect people from bugs that the newly-introduced hot_standby could introduce due to the few WAL records it adds. But it has been a couple of releases since there have been no such bugs, no? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Hello, Agree that windowing function will return all the rows compared to max and group by returing only max rows per group. But even while arriving at the aggregate/sorting windowing function seems to spend more effort than group by/order by. (I'll apologise in advance for possible misreading..) The most cause of the difference in time comes from sorting. Over 90% of total execution time has elapsed while sorting (49ms-2733ms) for the one using windowing function. If this sort were useless, the execution time would be less than 300 ms - seems comparable enough to group-by query. | Subquery Scan on __unnamed_subquery_0 | (actual time=2606.075..2953.937 rows=558 loops=1) | Filter: (__unnamed_subquery_0.rn = 1) | - WindowAgg (actual time=2606.063..2928.061 rows=122880 loops=1) | - Sort (actual time=2606.020..2733.677 rows=122880 loops=1) | Sort Key: student_score.course, student_score.score | - Seq Scan on student_score | (actual time=0.009..49.026 rows=122880 loops=1) As you see in above plan, sorting key is (course, score). If your point is the overall performance but not reusing a kind of 'hash', there's a big chance to eliminate this sorting if you are able to have an additional index, say, =# create index idx_co_sc on student_score using btree (course, score); With this index, you will get a different plan like this, uniontest=# explain analyze select student_name from (select student_name, dense_rank() over(partition by course order by score) rn, score from student_score) rnn where rn=2; QUERY PLAN --- Subquery Scan on rnn (actual time=0.088..319.403 rows=135 loops=1) Filter: (rnn.rn = 2) Rows Removed by Filter: 122746 - WindowAgg (actual time=0.037..296.851 rows=122881 loops=1) - Index Scan using idx_co_sc on student_score (actual time=0.027..111.333 rows=122881 loops=1) Total runtime: 319.483 ms Does this satisfies your needs? === Another thing, (I may be stupid and naive here) does PostgreSQL re-uses the hash which has been already created for sort. In this case the inner query must have created a hash for windoing aggregate. Can't we use that same one while applying the the filter rn=1 ? Generally saying, hashes cannot yield ordered output by its nature, I believe. Windowing function (execnode) always receives tuples sequentially in the window-defined order (as you see in the explained plan above) then processes the tuples in semi tuple-by-tuple manner to perform per-frame aggregaion, and finally outputs tuples of the same number to input. And furthermore, dense_rank() doesn't even need per-frame aggregations. So none of the planners so far seems to have chance to use a kind of hash tables to culculate/execute windowing fucntions. On the another point, automatically preserving some internal data within a query beyond the end of the query brings in 'when to discard it?' problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Example query causing param_info to be set in plain rel path
Hi All, In order to test various cases of LATERAL join in Postgres-XC, I am trying to find a query where RelOptInof-lateral_relids would get set for plain base relations. I tried various examples like below. But none of them was able to set lateral_relids in RelOptInfo of the involved relations. I am using current master branch. select * from tab1 left join lateral (select * from tab2 where tab2.val = tab1.val2) q using(val); This just pulled up the qual at higher level. select * from tab1 left join lateral (select sum(val2) sum_val2, val from tab2 where tab2.val2 = tab1.val2 group by tab2.val) q using(val); This at least caused the outer reference to be added as parameter, but again no lateral_relid. Is there an example query (however impractial query it would be), which can make the lateral_relids set in RelOptInfo of plain base relation? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
[HACKERS] Document update in alter_foreign_data_wrapper.sgml
Hi, ISTM the document in alter_foreign_data_wrapper.sgml and the comment in foreigncmds.c should be updated. Please find attached a patch. Thanks, Best regards, Etsuro Fujita fdw-doc-update.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-10-24 13:51:52 -0700, Josh Berkus wrote: On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. I think it would be worth estimating what this actually looks like in terms of log write quantity. My inclication is to say that if it increases log writes less than 10%, we don't need to provide an option to turn it off. It entirely depends on your workload. If it happens to be something like: INSERT INTO table (lots_of_data); CHECKPOINT; SELECT * FROM TABLE; i.e. there's a checkpoint between loading the data and reading it - not exactly all that uncommon - we'll need to log something for every page. That can be rather noticeable. Especially as I think it will be rather hard to log anything but a real FPI. I really don't think everyone will want this. I am absolutely not against providing an option to log enough information to make pg_rewind work, but I think providing a command to do *safe* *planned* failover will help in many more. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-10-24 22:57:29 +0200, Magnus Hagander wrote: In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. +1. On 2013-10-25 15:16:30 +0900, Michael Paquier wrote: But it has been a couple of releases since there have been no such bugs, no? One 'no' too much? Anyway, I think there have been more recent ones, but it's infrequent enough that we can remove the level anyway. FWIW, I've wondered if we shouldn't remove most of the EnableHotStandby checks in xlog.c. There are way too many difference how StartupXLOG behaves depending on HS. E.g. I quite dislike that we do stuff like StartupCLOG at entirely different times during recovery. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-24 10:59:21 -0400, Robert Haas wrote: On Tue, Oct 22, 2013 at 2:13 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-22 13:57:53 -0400, Robert Haas wrote: On Tue, Oct 22, 2013 at 1:08 PM, Andres Freund and...@2ndquadrant.com wrote: That strikes me as a flaw in the implementation rather than the idea. You're presupposing a patch where the necessary information is available in WAL yet you don't make use of it at the proper time. The problem is that the mapping would be somewhere *ahead* from the transaction/WAL we're currently decoding. We'd need to read ahead till we find the correct one. Yes, I think that's what you need to do. My problem with that is that rewrite can be gigabytes into the future. When reading forward we could either just continue reading data into the reorderbuffer, but delay replaying all future commits till we found the currently needed remap. That might have quite the additional storage/memory cost, but runtime complexity should be the same as normal decoding. Or we could individually read ahead for every transaction. But doing so for every transaction will get rather expensive (rougly O(amount_of_wal^2)). [ Sorry it's taken me a bit of time to get back to this; other tasks intervened, and I also just needed some time to let it settle in my brain. ] No worries. I've had enough things to work on ;) If you read ahead looking for a set of ctid translations from relfilenode A to relfilenode B, and along the way you happen to encounter a set of translations from relfilenode C to relfilenode D, you could stash that set of translations away somewhere, so that if the next transaction you process needs that set of mappings, it's already computed. With that approach, you'd never have to pre-read the same set of WAL files more than once. But, as I think about it more, that's not very different from your idea of stashing the translations someplace other than WAL in the first place. I mean, if the read-ahead thread generates a series of files in pg_somethingorother that contain those maps, you could have just written the maps to that directory in the first place. So on further review I think we could adopt that approach. Yea, that basically was my reasoning, only expressed much more nicely ;) However, I'm leery about the idea of using a relation fork for this. I'm not sure whether that's what you had it mind, but it gives me the willies. First, it adds distributed overhead to the system, as previously discussed; and second, I think the accounting may be kind of tricky, especially in the face of multiple rewrites. I'd be more inclined to find a separate place to store the mappings. Note that, AFAICS, there's no real need for the mapping file to be block-structured, and I believe they'll be written first (with no readers) and subsequently only read (with no further writes) and eventually deleted. I was thinking of storing it along other data used during logical decoding and let decoding's cleanup clean up that data as well. All the information for that should be there. There's one snag I currently can see, namely that we actually need to prevent that a formerly dropped relfilenode is getting reused. Not entirely sure what the best way for that is. One possible objection to this is that it would preclude decoding on a standby, which seems like a likely enough thing to want to do. So maybe it's best to WAL-log the changes to the mapping file so that the standby can reconstruct it if needed. The mapping file probably can be one big wal record, so it should be easy enough to do. For a moment I thought there's a problem with decoding on the standby having to read ahead of the current location to find the newer mapping, but that's actually not required since we're protected by the AEL lock during rewrites on the standby as well. I think that'd be pretty similar to just disallowing VACUUM FREEZE/CLUSTER on catalog relations since effectively it'd be to expensive to use. This seems unduly pessimistic to me; unless the catalogs are really darn big, this is a mostly theoretical problem. Well, it's not the size of the relation, but the amount of concurrent WAL that's being generated that matters. But anyway, if we do it like you described above that shouldn't be a problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 2013-10-21 16:15:58 +0200, Andres Freund wrote: I don't think I understand exactly what you have in mind for (2); can you elaborate? I have always thought that having a WaitForDecodingToCatchUp() primitive was a good way of handling changes that were otherwise too difficult to track our way through. I am not sure you're doing that at all right now, which in some sense I guess is fine, but I haven't really understood your aversion to this solution. There are some locking issues to be worked out here, but the problems don't seem altogether intractable. So, what we need to do for rewriting catalog tables would be: 1) lock table against writes 2) wait for all in-progress xacts to finish, they could have modified the table in question (we don't keep locks on system tables) 3) acquire xlog insert pointer 4) wait for all logical decoding actions to read past that pointer 5) upgrade the lock to an access exclusive one 6) perform vacuum full as usual The lock upgrade hazards in here are the reason I am adverse to the solution. And I don't see how we can avoid them, since in order for decoding to catchup it has to be able to read from the catalog... Otherwise it's easy enough to implement. So, I thought about this for some more and I think I've a partial solution to the problem. The worst thing about deadlocks that occur in the above is that they could be the VACUUM FULL waiting for the restart LSN[1] of a decoding slot to progress, but the restart LSN cannot progress because the slot is waiting for a xid/transaction to end which is being blocked by the lock upgrade from VACUUM FULL. Such conflicts are not visible to the deadlock detector, which obviously is bad. I've prototyped this (~25 lines) and this happens pretty frequently. But it turns out that we can actually fix this by exporting (to shared memory) the oldest in-progress xid of a decoding slot. Then the waiting code can do a XactLockTableWait() for that xid... I wonder if this is isn't maybe sufficient. Yes, it can deadlock, but that's already the case for VACUUM FULLs of system tables, although less likely. And it will be detected/handled. There's one more snag though, we currently allow CLUSTER system_table; in an existing transaction. I think that'd have to be disallowed. What do you think? Greetings, Andres Freund [1] The restart LSN is the point from where we need to be able read WAL to replay all changes the receiving side hasn't acked yet. -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Location for external scripts for Extensions?
On 10/24/13, 5:37 PM, Josh Berkus wrote: On 10/24/2013 02:36 PM, Peter Eisentraut wrote: On 10/22/13, 2:27 PM, Josh Berkus wrote: pg_partman has several external (python) scripts which help the extension, located in /extras/ in its source. The problem currently is that if you install pg_partman via pgxn or package, you don't get those scripts, because there's no install location for them. Use the SCRIPTS variable in pgxs, and they will get installed. Oh yeah? Cool. Is there an existing extension with an example of this? Don't know. I just saw this in the pgxs source code. Presumably, it was added because it was used somewhere. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Detection of nested function calls
Hi all, The Oslandia team is involved in PostGIS project for years, with a current focus on PostGIS 3D support. With PostGIS queries, nested functions calls that manipulate geometries are quite common, e.g.: SELECT ST_Union(ST_Intersection(a.geom, ST_Buffer(b.geom, 50))) PostGIS functions that manipulate geometries have to unserialize their input geometries from the 'flat' varlena representation to their own, and serialize the processed geometries back when returning. But in such nested call queries, this serialization-unserialization process is just an overhead. Avoiding it could then lead to a real gain in terms of performances [1], especially here when the internal type takes time to serialize (and with new PostGIS types like rasters or 3D geometries it's really meaningful) So we thought having a way for user functions to know if they are part of a nested call could allow them to avoid this serialization phase. The idea would be to have a boolean flag reachable from a user function (within FunctionCallInfoData) that says if the current function is nested or not. We already investigated such a modification and here is where we are up to now : - we modified the parser with a new boolean member 'nested' to the FuncExpr struct. Within the parser, we know if a function call is nested into another one and then we can mark the FuncExpr - the executor has been modified so it can take into account this nested member and pass it to the FunctionCallInfoData structure before evaluating the function We are working on a PostGIS branch that takes benefit of this functionality [2] You can find in attachment a first draft of the patch. Obviously, even if this is about a PostGIS use case here, this subject could be helpful for every other queries using both nested functions and serialization. I am quite new to postgresql hacking, so I'm sure there is room for improvements. But, what about this first proposal ? I'll be at the PGDay conf in Dublin next week, so we could discuss this topic. [1] Talking about performances, we already investigated such pass-by-reference mechanism with PostGIS. Taking a dummy function st_copy that only copies its input geometry to its output with 4 levels of nesting gives encouraging results (passing geometries by reference is more than 2x faster than (un)serializing) : https://github.com/Oslandia/sfcgal-tests/blob/master/bench/report_serialization_referenced_vs_native.pdf [2] https://github.com/Oslandia/postgis/tree/nested_ref_passing -- Hugo Mercier Oslandia diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 90c2753..56c701f 100644 --- a/src/backend/executor/execQual.c +++ b/src/backend/executor/execQual.c @@ -182,6 +182,8 @@ static Datum ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate, static Datum ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext, bool *isNull, ExprDoneCond *isDone); +static ExprState * ExecInitExprRec(Expr *node, PlanState *parent, Expr *parentNode); + /* * ExecEvalExpr routines @@ -4303,7 +4305,6 @@ ExecEvalExprSwitchContext(ExprState *expression, return retDatum; } - /* * ExecInitExpr: prepare an expression tree for execution * @@ -4336,9 +4337,21 @@ ExecEvalExprSwitchContext(ExprState *expression, * associated with a plan tree. (If so, it can't have aggs or subplans.) * This case should usually come through ExecPrepareExpr, not directly here. */ + ExprState * ExecInitExpr(Expr *node, PlanState *parent) { +return ExecInitExprRec(node, parent, NULL); +} + +/* + * ExecInitExprRec: used by ExecInitExpr + * + * The additional parameter 'parentNode' is the expression node parent of 'node'. + */ +static ExprState * +ExecInitExprRec(Expr *node, PlanState *parent, Expr *parentNode ) +{ ExprState *state; if (node == NULL) @@ -4408,10 +4421,10 @@ ExecInitExpr(Expr *node, PlanState *parent) aggstate-aggs = lcons(astate, aggstate-aggs); naggs = ++aggstate-numaggs; - astate-args = (List *) ExecInitExpr((Expr *) aggref-args, - parent); - astate-aggfilter = ExecInitExpr(aggref-aggfilter, - parent); + astate-args = (List *) ExecInitExprRec((Expr *) aggref-args, + parent, node); + astate-aggfilter = ExecInitExprRec(aggref-aggfilter, + parent, node); /* * Complain if the aggregate's arguments contain any @@ -4448,10 +4461,10 @@ ExecInitExpr(Expr *node, PlanState *parent) if (wfunc-winagg) winstate-numaggs++; - wfstate-args = (List *) ExecInitExpr((Expr *) wfunc-args, - parent); - wfstate-aggfilter = ExecInitExpr(wfunc-aggfilter, - parent); + wfstate-args = (List *) ExecInitExprRec((Expr *) wfunc-args, + parent, node); + wfstate-aggfilter = ExecInitExprRec(wfunc-aggfilter, + parent, node); /* * Complain if
Re: [HACKERS] Detection of nested function calls
Hello 2013/10/25 Hugo Mercier hugo.merc...@oslandia.com Hi all, The Oslandia team is involved in PostGIS project for years, with a current focus on PostGIS 3D support. With PostGIS queries, nested functions calls that manipulate geometries are quite common, e.g.: SELECT ST_Union(ST_Intersection(a.geom, ST_Buffer(b.geom, 50))) PostGIS functions that manipulate geometries have to unserialize their input geometries from the 'flat' varlena representation to their own, and serialize the processed geometries back when returning. But in such nested call queries, this serialization-unserialization process is just an overhead. Avoiding it could then lead to a real gain in terms of performances [1], especially here when the internal type takes time to serialize (and with new PostGIS types like rasters or 3D geometries it's really meaningful) So we thought having a way for user functions to know if they are part of a nested call could allow them to avoid this serialization phase. The idea would be to have a boolean flag reachable from a user function (within FunctionCallInfoData) that says if the current function is nested or not. We already investigated such a modification and here is where we are up to now : - we modified the parser with a new boolean member 'nested' to the FuncExpr struct. Within the parser, we know if a function call is nested into another one and then we can mark the FuncExpr - the executor has been modified so it can take into account this nested member and pass it to the FunctionCallInfoData structure before evaluating the function We are working on a PostGIS branch that takes benefit of this functionality [2] You can find in attachment a first draft of the patch. Obviously, even if this is about a PostGIS use case here, this subject could be helpful for every other queries using both nested functions and serialization. I am quite new to postgresql hacking, so I'm sure there is room for improvements. But, what about this first proposal ? I am not sure, if this solution is enough - what will be done if I store some values in PL/pgSQL variables? Regards Pavel I'll be at the PGDay conf in Dublin next week, so we could discuss this topic. [1] Talking about performances, we already investigated such pass-by-reference mechanism with PostGIS. Taking a dummy function st_copy that only copies its input geometry to its output with 4 levels of nesting gives encouraging results (passing geometries by reference is more than 2x faster than (un)serializing) : https://github.com/Oslandia/sfcgal-tests/blob/master/bench/report_serialization_referenced_vs_native.pdf [2] https://github.com/Oslandia/postgis/tree/nested_ref_passing -- Hugo Mercier Oslandia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New committer
Hi pgpool hackers, Muhammad Usama did great job in restructuring pgpool source code. And now he is working on incorporating PostgreSQL's exception and memory manager. I believe now is the time to add him to our one of committers and I just have done so. Please welcome Muhammad Usama as our new committer! -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
Le 25/10/2013 14:29, Pavel Stehule a écrit : Hello 2013/10/25 Hugo Mercier hugo.merc...@oslandia.com mailto:hugo.merc...@oslandia.com. I am quite new to postgresql hacking, so I'm sure there is room for improvements. But, what about this first proposal ? I am not sure, if this solution is enough - what will be done if I store some values in PL/pgSQL variables? You mean if you store the result of a (nested) function evaluation in a PL/pgSQL variable ? Then no nesting will be detected by the parser and in this case the user function must ensure its result is serialized, since it could be stored (in a variable or a table) at any time. -- Hugo Mercier Oslandia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
2013/10/25 Hugo Mercier hugo.merc...@oslandia.com Le 25/10/2013 14:29, Pavel Stehule a écrit : Hello 2013/10/25 Hugo Mercier hugo.merc...@oslandia.com mailto:hugo.merc...@oslandia.com. I am quite new to postgresql hacking, so I'm sure there is room for improvements. But, what about this first proposal ? I am not sure, if this solution is enough - what will be done if I store some values in PL/pgSQL variables? You mean if you store the result of a (nested) function evaluation in a PL/pgSQL variable ? Then no nesting will be detected by the parser and in this case the user function must ensure its result is serialized, since it could be stored (in a variable or a table) at any time. ok I remember, so I though about similar optimization when I worked on SQL/XML implementation - so same optimization can be used there. Regards Pavel -- Hugo Mercier Oslandia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Fri, Oct 25, 2013 at 2:12 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-24 17:17:22 -0700, Josh Berkus wrote: On 10/24/2013 04:55 PM, Robert Haas wrote: On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus j...@agliodbs.com wrote: On 10/23/2013 09:58 PM, Amit Kapila wrote: I wonder why anyone would like to freeze during CLUSTER command when they already have separate way (VACUUM FREEZE) to achieve it, do you know or can think of any case where user wants to do it along with Cluster command? If I'm rewriting the table anyway, let's freeze it. Otherwise, you have to write the same pages twice, if both CLUSTER and FREEZE are required. I wonder if we should go so far as to make this the default behavior, instead of just making it an option. +1 from me. Can you think of a reason you *wouldn't* want to freeze? It makes content from the future appear when you start using the relation in a query/session with an older snapshot. Currently CLUSTER is safe against that. Eh, what? We wouldn't freeze XIDs that don't precede RecentGlobalXmin. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 2013-10-25 09:13:20 -0400, Robert Haas wrote: On Fri, Oct 25, 2013 at 2:12 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-24 17:17:22 -0700, Josh Berkus wrote: On 10/24/2013 04:55 PM, Robert Haas wrote: On Thu, Oct 24, 2013 at 1:09 PM, Josh Berkus j...@agliodbs.com wrote: On 10/23/2013 09:58 PM, Amit Kapila wrote: I wonder why anyone would like to freeze during CLUSTER command when they already have separate way (VACUUM FREEZE) to achieve it, do you know or can think of any case where user wants to do it along with Cluster command? If I'm rewriting the table anyway, let's freeze it. Otherwise, you have to write the same pages twice, if both CLUSTER and FREEZE are required. I wonder if we should go so far as to make this the default behavior, instead of just making it an option. +1 from me. Can you think of a reason you *wouldn't* want to freeze? It makes content from the future appear when you start using the relation in a query/session with an older snapshot. Currently CLUSTER is safe against that. Eh, what? We wouldn't freeze XIDs that don't precede RecentGlobalXmin. Ah sorry, I thought that'd be the plan, similar to COPY FREEZE. Maybe because I've wished for it in the past ;) In that case I agree it should be the default. There really isn't any reason to not to immediately freeze tuples that can be frozen according to the xmin horizon. We don't immediately do it during normal vacuums because it would possibly cause superflous io - but that's not the case here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Thu, Oct 24, 2013 at 10:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I wonder if we should go so far as to make this the default behavior, instead of just making it an option. In that case you'd have to invent a NOFREEZE keyword, no? Ick. Only if we think anyone would ever NOT want to freeze. In any case, it's very far from obvious to me that CLUSTER ought to throw away information by default, which is what you're proposing. I find it odd to referring to this as throwing away information. I know that you have a general concern about throwing away XIDs that are still needed for forensic purposes, but that is clearly the ONLY purpose that those XIDs serve, and the I/O advantages of freezing by default could be massive for many of our users. What's going to happen in practice is that experienced users will simply recommend CLUSTER FREEZE rather than plain CLUSTER, and you won't have the forensic information *anyway*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
On Thu, Oct 24, 2013 at 6:39 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, Sorry that it's been a while.. 1. Observed symptom As you know, UNION ALL accompanied with ORDER BY uses indexes if available. uniontest=# EXPLAIN SELECT * FROM c11 UNION ALL SELECT * FROM c12 ORDER BY a; QUERY PLAN --- Merge Append (cost=0.59..10214.10 rows=20 width=16) Sort Key: c11.a - Index Scan using ... on c11 (cost=0.29..3857.04 rows=10 width=16) - Index Scan using ... on c12 (cost=0.29..3857.04 rows=10 width=16) So do simple queries on partitioned (inheritance) tables. uniontest=# EXPLAIN SELECT * FROM p1 ORDER BY a; QUERY PLAN -- Merge Append (cost=0.73..11392.19 rows=21 width=16) Sort Key: p1.a - Index Scan using ... on p1 (cost=0.12..8.14 rows=1 width=44) - Index Scan using ... on c11 (cost=0.29..3857.04 rows=10 width=16) - Index Scan using ... on c12 (cost=0.29..3857.04 rows=10 width=16) Nevertheless, UNION ALL on partitioned tables doesn't. This is quite unfavourable behavior especially having LIMIT. uniontest=# EXPLAIN ANALYZE SELECT * FROM p1 UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10; QUERY PLAN Limit (actual time=182.732..182.735 rows=10 loops=1) - Sort (actual time=182.729..182.730 rows=10 loops=1) Sort Key: p1.a Sort Method: top-N heapsort Memory: 25kB - Append (actual time=0.029..108.109 rows=40 loops=1) - Seq Scan on p1 (actual time=0.001..0.001 rows=0 loops=1) - Seq Scan on c11 (actual time=0.027..19.074 rows=10 loops=1) - Seq Scan on c12 (actual time=0.014..16.653 rows=10 loops=1) - Seq Scan on p2 (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on c21 (actual time=0.012..16.677 rows=10 loops=1) - Seq Scan on c22 (actual time=0.012..16.794 rows=10 loops=1) Total runtime: 182.857 ms 2. The cause In grouping_planner, flattern_simple_union_all creates appendrelinfos for each subqueries then expand_inherited_tables furthur expands the parent tables in each subqueries. Finally, this sequence leaves following structure. Where rte[2] and [3] are subqueries abandoned on the way pulling up rte[4] and [5]. rte[1] Subquery SELECT*1, inh = 1 +- appinfo[0] - rte[4] Relation p1, inh = 1 | +- appinfo[2] - rte[6] Relation p1, inh = 0 | +- appinfo[3] - rte[7] Relation c11, inh = 0 | +- appinfo[4] - rte[8] Relation c12, inh = 0 +- appinfo[1] - rte[5] Relation p2, inh = 1 +- appinfo[5] - rte[9] Relation p1, inh = 0 +- appinfo[6] - rte[10] Relation c11, inh = 0 +- appinfo[7] - rte[11] Relation c12, inh = 0 On the other hand, ec member finally has exprs only for varno = 1, 4 and 5, in other words, it lacks the members for the most descendant RTEs. This is because add_child_rel_equivalences() always inhibits add_eq_member from registering the child_rel's relid on EC. Conequently these grandchild relations does not find index pathkeys for themselves. 3. Measures I could thought up three approaches for the problem. One is to simplly modifying here to give child flag in the parameters of add_eq_member accordig to whether the child_rel is inh or not. This seems to work fine although leaves two levels of MergeAppends (PATCH-1). So the additional patch is attached to collapse these MergeAppends (PATCH-2). This gives the same plan as PATCH-3. uniontest=# explain analyze select * from p1 union all select * from p2 order by a limit 10; QUERY PLAN Limit (actual time=0.208..0.224 rows=10 loops=1) - Merge Append (actual time=0.205..0.218 rows=10 loops=1) Sort Key: p1.a - Merge Append (actual time=0.110..0.120 rows=10 loops=1) Sort Key: p1.a - Index Scan .. on p1 (actual time=0.006..0.006 rows=0 loops=1) - Index Scan .. on c11 (actual time=0.054..0.060 rows=10 loops=1) - Index Scan .. on c12 (actual time=0.046..0.046 rows=1 loops=1) - Merge Append (actual time=0.093..0.093 rows=1 loops=1) Sort Key: p2.a - Index Scan .. on p2 (actual time=0.002..0.002 rows=0 loops=1) - Index Scan .. on c21 (actual time=0.047..0.047 rows=1 loops=1) - Index Scan .. on c22 (actual time=0.043..0.043 rows=1 loops=1) Total runtime: 0.448 ms The second is to collapse the appendrel structure shown above to have only single level
Re: [HACKERS] proposal: lob conversion functionality
fixed documentation Regards Pavel 2013/10/24 Heikki Linnakangas hlinnakan...@vmware.com On 22.10.2013 13:55, Pavel Stehule wrote: 2013/10/21 Noah Mischn...@leadboat.com If you're prepared to change the function names and add the subset-oriented functions, I would appreciate that. here is patch lobj.sgml still refer to the old names. - Heikki diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e397386..8509d09 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3399,6 +3399,78 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); See also the aggregate function functionstring_agg/function in xref linkend=functions-aggregate. /para + + table id=functions-binarystring-largeobjects-transformations +titleLoading and saving from/to Large Objects Functions/title +tgroup cols=5 + thead + row + entryFunction/entry + entryReturn Type/entry + entryDescription/entry + entryExample/entry + entryResult/entry + /row + /thead + + tbody + row + entry +indexterm + primarylo_create/primary +/indexterm + literalfunctionlo_create(parameterloid/parameter typeoid/type, parameterstring/parameter typebytea/type )/function/literal + /entry + entrytypeoid/type/entry + entry + Create a large object and store a binary string there. Returns a oid of + created large object. + /entry + entryliteralselect lo_create(0, decode('ff00','hex'))/literal/entry + entryliteral24528/literal/entry + /row + + row + entry +indexterm + primarylo_get/primary +/indexterm +literalfunctionlo_get(parameterloid/parameter typeoid/type optional, parameteroffset/parameter typebigint/type, parameterstr/parameter typebytea/type/optional)/function/literal + /entry + entrytypebytea/type/entry + entry +Returns a binary string based on content a entered large object. Attention: binary +string has lower size limit (1GB) than large objects (4GB). Processing very large +large object can be very expensive for memory resources. Bytea data are completly +holded in memomory. + /entry + entryliterallo_get(24628)/literal/entry + entryliteral\xff00/literal/entry + /row + + row + entry +indexterm + primarylo_put/primary +/indexterm +literalfunctionlo_put(parameterloid/parametertypeoid/type, parameteroffset/parameter typebigint/type, parameterstr/parameter typebytea/type)/function/literal + /entry + entrytypevoid/type/entry + entry +Write data at offset. + /entry + entryliterallo_put(24628, 0, decode('', 'hex'))/literal/entry + entry/entry + /row + +/tbody + /tgroup + /table + + para + See also a description of other Large Objects Function + in xref linkend=lo-funcs. + /para /sect1 diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml index bb3e08f..32fda11 100644 --- a/doc/src/sgml/lobj.sgml +++ b/doc/src/sgml/lobj.sgml @@ -580,6 +580,42 @@ SELECT lo_export(image.raster, '/tmp/motd') FROM image these functions as functionloread/ and functionlowrite/. /para + para + There are other two functions , that doesn't correspond with client api + (see in xref linkend=functions-binarystring-largeobjects-transformations). + functionlo_create/function transforms a binary string to lo object, + functionlo_get/function transforms a lo object to binary string, and + functionlo_put/function write a binary string at offset to lo object. + /para + + para + Some examples: +programlisting +SELECT lo_create(0, decode('ff00','hex')); + lo_create +--- + 16392 +(1 row) + +SELECT lo_get(16392); + lo_get + + \xff00 +(1 row) + +SELECT lo_put(16392, 1, decode('aa','hex')); + lo_put + + +(1 row) + +SELECT lo_get(16392); + lo_get + + \xffaaff00 +(1 row) +/programlisting + /para /sect1 sect1 id=lo-examplesect diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c index fa00383..aa12349 100644 --- a/src/backend/libpq/be-fsstubs.c +++ b/src/backend/libpq/be-fsstubs.c @@ -754,3 +754,137 @@ deleteLOfd(int fd) { cookies[fd] = NULL; } + +/* + * auxiliary LO functions for management LO from SQL and PL + */ + +/* + * Load LO fragment and returns bytea + * + * When nbytes is a -1, then it reads from start (specified by offset) to end. + */ +static bytea * +lo_get_fragment_internal(Oid loOid, int64 offset, int nbytes) +{ + LargeObjectDesc *loDesc; + int64 loSize; + int result_length; + int total_read; + bytea *result = NULL; + + /* + * We don't actually need to store into fscxt, but create it
Re: [HACKERS] Example query causing param_info to be set in plain rel path
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: In order to test various cases of LATERAL join in Postgres-XC, I am trying to find a query where RelOptInof-lateral_relids would get set for plain base relations. I think you need a lateral reference in a function or VALUES FROM-item. As you say, plain sub-selects are likely to get flattened. (Possibly if you stuck in a flattening fence such as OFFSET 0, you could get the case to happen with a sub-select FROM item, but I'm too lazy to check.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RULE regression test fragility?
Andres Freund wrote: On 2013-10-24 09:22:52 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: FWIW, I've repeatedly now thought that it'd make maintaining/updating patches easier if we switched that query into unaligned tuple only (\a \t) mode. That would remove the frequent conflicts on the row count and widespread changes due to changed alignment. Alternatively we could just wrap the query in \copy ... CSV. Hm ... yeah, it would be a good thing if changes in one view didn't so frequently have ripple effects to the whole output. Not sure which format is best for that though. Something like the attached maybe? +1 (but what are those silly parens in pg_seclabels definition?), except: +-- disable fancy output again +\a\t Should be enable. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
Hugo Mercier hugo.merc...@oslandia.com writes: PostGIS functions that manipulate geometries have to unserialize their input geometries from the 'flat' varlena representation to their own, and serialize the processed geometries back when returning. But in such nested call queries, this serialization-unserialization process is just an overhead. This is a reasonable thing to worry about, not just for PostGIS types but for many container types such as arrays --- it'd be nice to be able to work with an in-memory representation that wasn't just a contiguous blob of data. For instance, assignment to an array element might become a constant-time operation even when working with variable-length datatypes. So we thought having a way for user functions to know if they are part of a nested call could allow them to avoid this serialization phase. However, this seems like a completely wrong way to go at it. In the first place, it wouldn't help for situations like a complex value stored in a plpgsql variable. In the second, I don't think that what you are describing scales to any more than the most trivial situations. What about functions with more than one complex-type input, for example? And you'd need to be certain that every single function taking or returning the datatype gets updated at exactly the same time, else it'll break. I think the right way to attack it is to create some way for a Datum value to indicate, at runtime, whether it's a flat value or an in-memory representation. Any given function returning the type could choose to return either representation. The datatype would have to provide a way to serialize the in-memory representation, when and if it came time to store it in a table. To avoid breaking functions that hadn't yet been taught about the new representation, we'd probably want to redefine the existing DETOAST macros as also invoking this datatype flattening function, and then you'd need to use some new access macro if you wanted visibility of the non-flat representation. (This assumes that the whole thing is only applicable to toastable datatypes, but that seems like a reasonable restriction.) Another thing that would have to be attacked in order to make the plpgsql-variable case work is that you'd need some design for copying such Datums in-memory, and perhaps a reference count mechanism to optimize away unnecessary copies. Your idea of tying the optimization to the nested function call scenario would avoid the need to solve this problem, but I think it's too narrow a scope to justify all the other work that'd be involved. Some colleagues of mine at Salesforce have been playing with ideas like this, though last I heard they were nowhere near having a submittable patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New committer
Tatsuo Ishii is...@postgresql.org writes: Hi pgpool hackers, Muhammad Usama did great job in restructuring pgpool source code. And now he is working on incorporating PostgreSQL's exception and memory manager. I believe now is the time to add him to our one of committers and I just have done so. I assume this went to the wrong mailing list? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
On 2013-10-25 10:18:27 -0400, Tom Lane wrote: I think the right way to attack it is to create some way for a Datum value to indicate, at runtime, whether it's a flat value or an in-memory representation. Any given function returning the type could choose to return either representation. The datatype would have to provide a way to serialize the in-memory representation, when and if it came time to store it in a table. To avoid breaking functions that hadn't yet been taught about the new representation, we'd probably want to redefine the existing DETOAST macros as also invoking this datatype flattening function, and then you'd need to use some new access macro if you wanted visibility of the non-flat representation. (This assumes that the whole thing is only applicable to toastable datatypes, but that seems like a reasonable restriction.) That sounds reasonable, and we have most of the infrastructure for it since the indirect toast thing got in. Another thing that would have to be attacked in order to make the plpgsql-variable case work is that you'd need some design for copying such Datums in-memory, and perhaps a reference count mechanism to optimize away unnecessary copies. Your idea of tying the optimization to the nested function call scenario would avoid the need to solve this problem, but I think it's too narrow a scope to justify all the other work that'd be involved. I've thought about refcounting Datums several times, but I always got stuck when thinking about how to deal memory context resets and errors. Any ideas about that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
Hi, On 2013-10-22 16:07:16 +0200, Andres Freund wrote: On 2013-10-21 20:16:29 +0200, Andres Freund wrote: Current draft is: ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL|DEFAULT ALTER TABLE ... REPLICA IDENTITY USING INDEX ...; which leaves the door open for ALTER TABLE ... REPLICA IDENTITY USING '(' column_name_list ')'; Does anybody have a strong feeling about requiring support for CREATE TABLE for this? Attached is a patch ontop of master implementing this syntax. It's not wired up to the changeset extraction patch yet as I am not sure whether others agree about the storage. So, I am currently wondering about how to store the old tuple, based on this. Currently it is stored using the TupleDesc of the index the old tuple is based on. But if we want to allow transporting the entire tuple that obviously cannot be the only option. One option would be to change the stored format based on what's configured, using the relation's TupleDesc if FULL is used. But I think always using the heap relation's desc is better. The not-logged columns would then just be represented as NULLs. That will make old primary keys bigger if the relation has a high number of columns and the key small, but I don't think it matters enough. Opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
Andres Freund and...@2ndquadrant.com writes: On 2013-10-25 10:18:27 -0400, Tom Lane wrote: I think the right way to attack it is to create some way for a Datum value to indicate, at runtime, whether it's a flat value or an in-memory representation. That sounds reasonable, and we have most of the infrastructure for it since the indirect toast thing got in. Oh really? I hadn't been paying much attention to that, but obviously I better go back and study it. I've thought about refcounting Datums several times, but I always got stuck when thinking about how to deal memory context resets and errors. Any ideas about that? Not yet. But it makes no sense to claim that a Datum could have a reference that's longer-lived than the memory context it's in, so I'm not sure the context reset case is really a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
Le 25/10/2013 16:18, Tom Lane a écrit : Hugo Mercier hugo.merc...@oslandia.com writes: PostGIS functions that manipulate geometries have to unserialize their input geometries from the 'flat' varlena representation to their own, and serialize the processed geometries back when returning. But in such nested call queries, this serialization-unserialization process is just an overhead. This is a reasonable thing to worry about, not just for PostGIS types but for many container types such as arrays --- it'd be nice to be able to work with an in-memory representation that wasn't just a contiguous blob of data. For instance, assignment to an array element might become a constant-time operation even when working with variable-length datatypes. So we thought having a way for user functions to know if they are part of a nested call could allow them to avoid this serialization phase. However, this seems like a completely wrong way to go at it. In the first place, it wouldn't help for situations like a complex value stored in a plpgsql variable. In the second, I don't think that what you are describing scales to any more than the most trivial situations. What about functions with more than one complex-type input, for example? And you'd need to be certain that every single function taking or returning the datatype gets updated at exactly the same time, else it'll break. About plpgsql variables : no there won't be no optimization in that case. At the time the function result has to be stored in a variable, it must be serialized. About functions with more than one complex-type input, as soon as each parameter are of the same type, there is no problem with that. But if your function deals with more than one complex type AND you want to avoid serialization on each parameter, then yes, each type must be aware of this possible optimization (choose whether to serialize or not). I don't understand what you mean by be certain that every single function ... gets updated at exactly the same time. Could you develop ? I think the right way to attack it is to create some way for a Datum value to indicate, at runtime, whether it's a flat value or an in-memory representation. Any given function returning the type could choose to return either representation. The datatype would have to provide a way to serialize the in-memory representation, when and if it came time to store it in a table. To avoid breaking functions that hadn't yet been taught about the new representation, we'd probably want to redefine the existing DETOAST macros as also invoking this datatype flattening function, and then you'd need to use some new access macro if you wanted visibility of the non-flat representation. (This assumes that the whole thing is only applicable to toastable datatypes, but that seems like a reasonable restriction.) You're totally right. That is very close to what I am working on with PostGIS. This is still early work, but for some details : https://github.com/Oslandia/postgis/blob/nested_ref_passing/postgis/lwgeom_ref.h Basically, the 'geometry' type of PostGIS is here extended with a flag saying if the data is actual 'flat' data or a plain pointer. And if this is a pointer, a type identifier is stored. And there is a new DETOAST macro (here POSTGIS_DETOAST_DATUM) that will test if the Datum is a pointer or not and if it is the case, call corresponding unserializing functions. So you can avoid copies if your function is aware of that, and the change for existing functions will be minimum. https://github.com/Oslandia/postgis/blob/nested_ref_passing/postgis/lwgeom_ref.c You said when and if it came time to store it in a table. And, that is exactly the point of this 'nested' boolean: when do you know that it is time to store in a table, from a function point of view, otherwise ? Another thing that would have to be attacked in order to make the plpgsql-variable case work is that you'd need some design for copying such Datums in-memory, and perhaps a reference count mechanism to optimize away unnecessary copies. Your idea of tying the optimization to the nested function call scenario would avoid the need to solve this problem, but I think it's too narrow a scope to justify all the other work that'd be involved. Do you think it must necessarly cover the plpgsql variable case to be acceptable ? -- Hugo Mercier Oslandia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
On 2013-10-25 11:01:28 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-10-25 10:18:27 -0400, Tom Lane wrote: I think the right way to attack it is to create some way for a Datum value to indicate, at runtime, whether it's a flat value or an in-memory representation. That sounds reasonable, and we have most of the infrastructure for it since the indirect toast thing got in. Oh really? I hadn't been paying much attention to that, but obviously I better go back and study it. Well, it has the infrastructure for adding further types of varattrib_1b_e types and for computing the size independently. So you can easily add a new type of toast datum. There still needs to be handling for it in tuptoaster.c et al, but that's not surprising ;) I've thought about refcounting Datums several times, but I always got stuck when thinking about how to deal memory context resets and errors. Any ideas about that? Not yet. But it makes no sense to claim that a Datum could have a reference that's longer-lived than the memory context it's in, so I'm not sure the context reset case is really a problem. Given how short lived many of the contexts used for expression evaluation are, that might restrict the usefullness quite a bit. I think at the very least it has to be allowed that a Datum gets also used in child contexts. But that's already opens up the door for refcount leakage when the child context gets destroyed. I wonder if this needs mcxt.c/aset.c support to be useful. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
Hugo Mercier hugo.merc...@oslandia.com writes: Le 25/10/2013 16:18, Tom Lane a écrit : However, this seems like a completely wrong way to go at it. In the first place, it wouldn't help for situations like a complex value stored in a plpgsql variable. In the second, I don't think that what you are describing scales to any more than the most trivial situations. What about functions with more than one complex-type input, for example? And you'd need to be certain that every single function taking or returning the datatype gets updated at exactly the same time, else it'll break. About functions with more than one complex-type input, as soon as each parameter are of the same type, there is no problem with that. How do you tell the difference between foo(col1, bar(col2)) foo(bar(col1), col2) I don't understand what you mean by be certain that every single function ... gets updated at exactly the same time. Could you develop ? If you're tying this to the syntax of the expression, then bar() *must* return a non-serialized value when and only when foo() is expecting that, therefore their implementations must change at the same time. Perhaps that's workable for PostGIS, but it's a complete nonstarter for widely-known datatypes like arrays, where affected functions might be spread through any number of extensions. We need a design that permits incremental fixing of functions that work with a deserializable datatype. Another point worth worrying about is that not all expressions are function calls, nor do all function calls arise from expressions. Chasing down all the corner cases and making sure they work properly in a syntax-driven approach is going to be a headache. Basically, the 'geometry' type of PostGIS is here extended with a flag saying if the data is actual 'flat' data or a plain pointer. And if this is a pointer, a type identifier is stored. If you're doing that, why do you need the decoration on the FuncExpr expressions? Can't you just look at your input datums and see if they're flat or not? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
Le 25/10/2013 17:20, Tom Lane a écrit : Hugo Mercier hugo.merc...@oslandia.com writes: Le 25/10/2013 16:18, Tom Lane a écrit : How do you tell the difference between foo(col1, bar(col2)) foo(bar(col1), col2) Still not sure to understand ... I assume foo() takes two argument of type A. bar() can take one argument of A or another type B. In bar(), you would have the choice to return either a plain A or a pointer to A. Because bar() knows its call is nested (by foo()), than it can decide to return a pointer to A. foo() is then evaluated and we assume it knows A can be a pointer. foo() then knows its nesting level of 0 and must return something serialized in that case. I don't understand what you mean by be certain that every single function ... gets updated at exactly the same time. Could you develop ? If you're tying this to the syntax of the expression, then bar() *must* return a non-serialized value when and only when foo() is expecting that, therefore their implementations must change at the same time. Perhaps that's workable for PostGIS, but it's a complete nonstarter for widely-known datatypes like arrays, where affected functions might be spread through any number of extensions. We need a design that permits incremental fixing of functions that work with a deserializable datatype. Yes. It could work for each user type assuming each function working with this type is aware of this pointer/serialized nature, including extensions. So you have to, at least, recompile every extensions depending on that types. Which ... limits the interest for very general types, I have to admit. Another point worth worrying about is that not all expressions are function calls, nor do all function calls arise from expressions. Chasing down all the corner cases and making sure they work properly in a syntax-driven approach is going to be a headache. We could add this 'nesting' detection to operators (and probably other constructs that I don't know) little by little. Optimizing only function calls as a first step is not enough ? Basically, the 'geometry' type of PostGIS is here extended with a flag saying if the data is actual 'flat' data or a plain pointer. And if this is a pointer, a type identifier is stored. If you're doing that, why do you need the decoration on the FuncExpr expressions? Can't you just look at your input datums and see if they're flat or not? If a function returns a pointer whatever the nesting level is, you could end with something storing a raw pointer, which is bad. You could eventually add a way to detect that what you stored was as pointer and that your data no longer exists (be NULL ?) when read back, but you basically end with users manipulating pointers, which is bad. If you want to make it transparent to the user, you need to know the nesting level to decide whether you could just pass it to something that is aware of this pointer (nesting level =1) or serialize it back (nesting level == 0). -- Hugo Mercier Oslandia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] LISTEN / NOTIFY enhancement request for Postgresql
Hi pgsql-hackers, The LISTEN / NOTIFY feature (along with the pg_notify() function) is a unique feature that differentiates Postgresql from nearly all other relational database systems. With the exception of SQL Server, I know of no other RDBMSs that allow a client to be asynchronously notified by the database server. This feature embodies the modern push approach and allows delivering timely data to the user as it changes, instead of the more traditional pull approach which requires the user to request the data at specific intervals. Vendors are rolling out push technologies to meet market demand. Microsoft recently introduced SignalR - which is a framework for pushing content to ASP.NET Web pages. Similarly Complex Event Processing systems push information to users' dashboards in real-time. In contrast with RDBMS's where asynchronous notification is a special feature, message broker software implementations live and breathe asynchronous notification.So I feel that the LISTEN / NOTIFY feature is trying to deliver some of the asynchronous notification features of a message broker but it lacks some of the flexibility. One particular shortcoming of LISTEN / NOTIFY is the fact that the channel specified on the LISTEN must _exactly _match the channel specified on the NOTIFY. Here is an example of the problem: I have two listeners: 1. Interested in all stock quote updates 2. Interested in stock quote updates for IBM only There is a table that contains stock prices with a trigger proc that issues a NOTIFY using pg_notify() upon update. There isn't a single channel that I can use that will deliver the message to both listeners. To get around the problem I could publish a message on channel PRICE and another message on channel PRICE.IBM but sending two notifications is far from optimal. Message brokers have implemented a neat way to get around this issue. It is accomplished by allowing wildcards in message topic subscriptions. Here is an example implementation:http://activemq.apache.org/nms/activemq-wildcards.html * is used to separate names in a path * * is used to match any name in a path * is used to recursively match any destination starting from this name For example using the example above, these subscriptions are possible SubscriptionMeaning PRICE. Any price for any product on any exchange PRICE.STOCK.Any price for a stock on any exchange PRICE.STOCK.NASDAQ.*Any stock price on NASDAQ PRICE.STOCK.*.IBM Any IBM stock price on any exchange My request is to implement the same or similar feature in Postgresql. Thank you. -Sev
Re: [HACKERS] Detection of nested function calls
Hugo Mercier hugo.merc...@oslandia.com writes: Le 25/10/2013 17:20, Tom Lane a écrit : How do you tell the difference between foo(col1, bar(col2)) foo(bar(col1), col2) Still not sure to understand ... I assume foo() takes two argument of type A. bar() can take one argument of A or another type B. I was assuming everything was the same datatype in this example, ie col1, col2, and the result of bar() are all type A. The point I'm trying to make is that in the first case, foo would be receiving a first argument that was flat and a second that was not flat; while in the second case, it would be receiving a first argument that was not flat and a second that was flat. The expression labeling you're proposing does not help it tell the difference. What's more, you're proposing that the labeling be made by generic code that can't possibly know what bar() is really going to do. In bar(), you would have the choice to return either a plain A or a pointer to A. Because bar() knows its call is nested (by foo()), than it can decide to return a pointer to A. foo() is then evaluated and we assume it knows A can be a pointer. foo() then knows its nesting level of 0 and must return something serialized in that case. Whoa. That's the most fragile, assumption-filled way you could possibly go about this. In general, bar() cannot be expected to know whether the outer function is able to take a non-flat parameter value. And you've glossed over how foo() would know whether its input was flat or not. Another point here is that there's no good reason to suppose that a function should return a flattened value just because it's at the outer level of its syntactic expression. For example, if we're doing a plain SELECT foo(...) FROM ..., the next thing that will happen with that value is it'll be fed to the output function for the datatype. Maybe that output function would like to have a non-flat input value, too, to save the time of transforming back to that representation. On the other hand, if it's a SELECT ... ORDER BY ... and the planner chooses to do the ORDER BY with a final sort step, we'll probably have to flatten the value to pass it through sorting. (Or possibly not --- perhaps we could just pass the toast token through sorting?) There are a lot of considerations here and it's really unreasonable to expect that static expression labeling will be able to do the right thing every time. Basically the only way to make this work reliably is for Datums to be self-identifying as to whether they're flat or structured values; then make code do the right thing on-the-fly at runtime depending on what kind of Datum it gets. Once you've done that, I don't see that parse-time labeling of expression nesting adds anything useful. As Andres said, the provisions for toasted datums are a good precedent, and none of that depends on parse-time decisions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 10/24/2013 07:19 PM, Tom Lane wrote: In any case, it's very far from obvious to me that CLUSTER ought to throw away information by default, which is what you're proposing. The problem here is that you're thinking of the 1/10 of 1% of our users who have a serious PostgreSQL failure and post something on the lists for help, for which XID forensic information is useful. As opposed to the 99.9% of our users for whom deferred freezing is a performance burden. While I realize that the 0.1% of users are more likely to have contact with you, personally, it's still bad policy for the project. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql should show disabled internal triggers
On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote: On 2013-09-18 15:15:55 +0200, Bernd Helmle wrote: --On 18. September 2013 13:52:29 +0200 Andres Freund lt;andres@gt; wrote: If you do ALTER TABLE ... DISABLE TRIGGER ALL; and then individually re-enable the disabled triggers it's easy to miss internal triggers. A \d+ tablename will not show anything out of the ordinary for that situation since we don't show internal triggers. But foreign key checks won't work. So, how about displaying disabled internal triggers in psql? Hi had exactly the same concerns this morning while starting to look at the ENABLE/DISABLE constraint patch. However, i wouldn't display them as triggers, but maybe more generally as disabled constraints or such. Well, that will lead the user in the wrong direction, won't it? They haven't disabled the constraint but the trigger. Especially as we already have NOT VALID and might grow DISABLED for constraint themselves... Hi, The attached patch [1] enable PSQL to list internal disabled triggers in \d only in versions = 9.0. [1] psql-display-all-triggers-v1.patch http://postgresql.1045698.n5.nabble.com/file/n5775954/psql-display-all-triggers-v1.patch Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello - -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello -- View this message in context: http://postgresql.1045698.n5.nabble.com/psql-should-show-disabled-internal-triggers-tp5771406p5775954.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Detection of nested function calls
On Fri, Oct 25, 2013 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Hugo Mercier hugo.merc...@oslandia.com writes: PostGIS functions that manipulate geometries have to unserialize their input geometries from the 'flat' varlena representation to their own, and serialize the processed geometries back when returning. But in such nested call queries, this serialization-unserialization process is just an overhead. This is a reasonable thing to worry about, not just for PostGIS types but for many container types such as arrays --- it'd be nice to be able to work with an in-memory representation that wasn't just a contiguous blob of data. For instance, assignment to an array element might become a constant-time operation even when working with variable-length datatypes. I bet numeric could benefit as well. Essentially all of the operations on numeric start by transforming the on-disk representation into an internal form used only for the duration of a single call, and end by transforming the internal form of the result back to the on-disk representation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Additional information on log_line_prefix
Hi guys, I'm working on a quick convertion script for query reviews and I wonder if a feature request to add the following values will be possible: %D = duration %L = lock_time (lock only for this query) %E = estimated rows %R = total rows returned %B = total bytes sent %T = temporal tables used Those prefixes/values are just examples/proposals. Thanks for the hard work! -- -- Emanuel Calvo
Re: [HACKERS] Additional information on log_line_prefix
emanuel_calvo wrote %E = estimated rows How would you expect this to work? This information seems mostly useless without the context of a full EXPLAIN output. %T = temporal tables used I am guessing you mean temporary, not temporal - the later also being known as time oriented David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Additional-information-on-log-line-prefix-tp5775956p5775958.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional information on log_line_prefix
On 10/25/2013 01:50 PM, Emanuel Calvo wrote: Hi guys, I'm working on a quick convertion script for query reviews and I wonder if a feature request to add the following values will be possible: %D = duration %L = lock_time (lock only for this query) %E = estimated rows %R = total rows returned %B = total bytes sent %T = temporal tables used Those prefixes/values are just examples/proposals. Thanks for the hard work! Pretty much all of this can be got with the auto_explain module, and I think that's where it belongs. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On 22.10.2013 19:55, Heikki Linnakangas wrote: I fixed the the same problem in GiST a few years back, by making it tolerate missing downlinks, and inserting them lazily. The B-tree code tolerates them already on scans, but gets confused on insertion, as seen above. I propose that we use the same approach I used with GiST, and add a flag to the page header to indicate the downlink hasn't been inserted yet. When insertion (or vacuum) bumps into a flagged page, it can finish the incomplete action by inserting the downlink. This is what I came up with. One thing I'm not totally happy about is the way page deletions of incompletely split pages are handled. Basically, it just bails out and refuses to delete a page that is part of an incomplete split. That's probably OK in practice, as incomplete splits should be very rare anyway, but it's a bit dissatisfying to not handle the case because at first glance it seems like it should be even simpler than usual to delete a page that has no downlink. Nevertheless, I decided to just skip that for now. After this patch, deleting the only child of a parent and the parent itself is still a multi-WAL-record operation that needs to be tracked during recovery, and completed at the end of recovery. I'd like to eliminate that too, but that's another patch. - Heikki diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README index 40f09e3..29d8bd1 100644 --- a/src/backend/access/nbtree/README +++ b/src/backend/access/nbtree/README @@ -384,12 +384,41 @@ an additional insertion above that, etc). For a root split, the followon WAL entry is a new root entry rather than an insertion entry, but details are otherwise much the same. -Because insertion involves multiple atomic actions, the WAL replay logic -has to detect the case where a page split isn't followed by a matching -insertion on the parent level, and then do that insertion on its own (and -recursively for any subsequent parent insertion, of course). This is -feasible because the WAL entry for the split contains enough info to know -what must be inserted in the parent level. +Because insertion involves multiple atomic actions, it's possible that the +system crashes between splitting a page and inserting the downlink for the +new half to the parent. After recovery, the downlink for the new page will +be missing. The search algorithm works correctly, as the page will be found +by following the right-link from its left sibling, although if a lot of +downlinks in the tree are missing, performance will suffer. A more serious +consequence is that if the page without a downlink gets split again, the +insertion algorithm will fail to find the location in the parent level to +insert the downlink. + +Our approach is to create any missing downlinks along the way, when +searching the tree for a new insertion. It could be done during searches, +too, but it seems best not to put any extra writes in what would otherwise +be a read-only operation (it would not be possible in hot standby mode +anyway). To identify missing downlinks, when a page is split, the left page +is flagged to indicate that the split is not yet complete (INCOMPLETE_SPLIT). +When the downlink is inserted to the parent, the flag is atomically cleared. +The child page is kept locked until the insertion in the parent is finished +and the flag in the child cleared, but can be released immediately after +that, before recursing up the tree, if the parent also needs to be split. +That ensures that incompletely split pages should not be seen under normal +circumstances; only when a transaction fails to insert the parent for some +reason. + +We flag the left page, even though it's the right page that's missing the +downlink, beacuse it's more convenient to know already when following the +right-link from the left page to the right page that it will need to have +its downlink inserted to the parent. + +We used to keep track of incomplete splits during recovery and finish them +immediately at end of recovery, instead of doing it lazily at the next +insertion. However, that made the recovery much more complicated, and only +fixed the problem when crash recovery was performed. An incomplete split can +also occur if an otherwise recoverable error, like out-of-memory or +out-of-disk-space, happens while inserting the downlink to the parent. When splitting a non-root page that is alone on its level, the required metapage update (of the fast root link) is performed and logged as part diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index a452fea..a810015 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -59,14 +59,16 @@ static void _bt_findinsertloc(Relation rel, ScanKey scankey, IndexTuple newtup, Relation heapRel); -static void _bt_insertonpg(Relation rel, Buffer buf, +static void _bt_insertonpg(Relation rel, Buffer buf, Buffer
Re: [HACKERS] CLUSTER FREEZE
On 25 October 2013 01:17, Josh Berkus j...@agliodbs.com wrote: On 10/24/2013 04:55 PM, Robert Haas wrote: I wonder if we should go so far as to make this the default behavior, instead of just making it an option. +1 from me. Can you think of a reason you *wouldn't* want to freeze? Ok, I attach an alternative patch that makes CLUSTER *always* freeze, without any option (but doesn't affect VACUUM FULL in the same way). I will post both alternatives to the commitfest app since there seems to be some disagreement about whether tuple freezing should be an optional. Thanks Thomas Munro cluster-freeze-always.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On Fri, Oct 25, 2013 at 10:58 AM, Andres Freund and...@2ndquadrant.com wrote: So, I am currently wondering about how to store the old tuple, based on this. Currently it is stored using the TupleDesc of the index the old tuple is based on. But if we want to allow transporting the entire tuple that obviously cannot be the only option. One option would be to change the stored format based on what's configured, using the relation's TupleDesc if FULL is used. But I think always using the heap relation's desc is better. I heartily agree. The not-logged columns would then just be represented as NULLs. That will make old primary keys bigger if the relation has a high number of columns and the key small, but I don't think it matters enough. Even if it does matter, the cure seems likely to be worse than the disease. My only other comment is that if NONE is selected, we ought to omit the old tuple altogether, not store one that is all-nulls. But I bet you had that in mind anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] dsm use of uint64
When I wrote the dynamic shared memory patch, I used uint64 everywhere to measure sizes - rather than, as we do for the main shared memory segment, Size. This now seems to me to have been the wrong decision; I'm finding that it's advantageous to make dynamic shared memory behave as much like the main shared memory segment as is reasonably possible, and using Size facilitates the use of MAXALIGN(), TYPEALIGN(), etc. as well as things like add_size() and mul_size() which are just as relevant in the dynamic shared memory case as they are for the main shared memory segment. Therefore, I propose to apply the attached patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c index e516197..4d7fb08 100644 --- a/src/backend/storage/ipc/dsm.c +++ b/src/backend/storage/ipc/dsm.c @@ -67,7 +67,7 @@ struct dsm_segment uint32 control_slot; /* Slot in control segment. */ void *impl_private; /* Implementation-specific private data. */ void *mapped_address; /* Mapping address, or NULL if unmapped. */ - uint64 mapped_size; /* Size of our mapping. */ + Size mapped_size; /* Size of our mapping. */ }; /* Shared-memory state for a dynamic shared memory segment. */ @@ -94,7 +94,7 @@ static void dsm_postmaster_shutdown(int code, Datum arg); static void dsm_backend_shutdown(int code, Datum arg); static dsm_segment *dsm_create_descriptor(void); static bool dsm_control_segment_sane(dsm_control_header *control, - uint64 mapped_size); + Size mapped_size); static uint64 dsm_control_bytes_needed(uint32 nitems); /* Has this backend initialized the dynamic shared memory system yet? */ @@ -128,7 +128,7 @@ static dlist_head dsm_segment_list = DLIST_STATIC_INIT(dsm_segment_list); */ static dsm_handle dsm_control_handle; static dsm_control_header *dsm_control; -static uint64 dsm_control_mapped_size = 0; +static Size dsm_control_mapped_size = 0; static void *dsm_control_impl_private = NULL; /* @@ -142,7 +142,7 @@ dsm_postmaster_startup(void) { void *dsm_control_address = NULL; uint32 maxitems; - uint64 segsize; + Size segsize; Assert(!IsUnderPostmaster); @@ -214,8 +214,8 @@ dsm_cleanup_using_control_segment(void) void *junk_mapped_address = NULL; void *impl_private = NULL; void *junk_impl_private = NULL; - uint64 mapped_size = 0; - uint64 junk_mapped_size = 0; + Size mapped_size = 0; + Size junk_mapped_size = 0; uint32 nitems; uint32 i; dsm_handle old_control_handle; @@ -453,7 +453,7 @@ dsm_postmaster_shutdown(int code, Datum arg) void *dsm_control_address; void *junk_mapped_address = NULL; void *junk_impl_private = NULL; - uint64 junk_mapped_size = 0; + Size junk_mapped_size = 0; /* * If some other backend exited uncleanly, it might have corrupted the @@ -562,7 +562,7 @@ dsm_backend_startup(void) * Create a new dynamic shared memory segment. */ dsm_segment * -dsm_create(uint64 size) +dsm_create(Size size) { dsm_segment *seg = dsm_create_descriptor(); uint32 i; @@ -733,7 +733,7 @@ dsm_backend_shutdown(int code, Datum arg) * address. For the caller's convenience, we return the mapped address. */ void * -dsm_resize(dsm_segment *seg, uint64 size) +dsm_resize(dsm_segment *seg, Size size) { Assert(seg-control_slot != INVALID_CONTROL_SLOT); dsm_impl_op(DSM_OP_RESIZE, seg-handle, size, seg-impl_private, @@ -887,7 +887,7 @@ dsm_segment_address(dsm_segment *seg) /* * Get the size of a mapping. */ -uint64 +Size dsm_segment_map_length(dsm_segment *seg) { Assert(seg-mapped_address != NULL); @@ -947,7 +947,7 @@ dsm_create_descriptor(void) * our segments at all. */ static bool -dsm_control_segment_sane(dsm_control_header *control, uint64 mapped_size) +dsm_control_segment_sane(dsm_control_header *control, Size mapped_size) { if (mapped_size offsetof(dsm_control_header, item)) return false; /* Mapped size too short to read header. */ diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c index 627f00b..2056668 100644 --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -69,24 +69,24 @@ #include utils/memutils.h #ifdef USE_DSM_POSIX -static bool dsm_impl_posix(dsm_op op, dsm_handle handle, uint64 request_size, +static bool dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, void **impl_private, void **mapped_address, - uint64 *mapped_size, int elevel); + Size *mapped_size, int elevel); #endif #ifdef USE_DSM_SYSV -static bool dsm_impl_sysv(dsm_op op, dsm_handle handle, uint64 request_size, +static bool dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size, void **impl_private, void **mapped_address, - uint64 *mapped_size, int elevel); + Size *mapped_size, int elevel); #endif #ifdef USE_DSM_WINDOWS -static bool dsm_impl_windows(dsm_op op, dsm_handle