Re: [HACKERS] On partitioning
On 12/2/14, 9:43 PM, Amit Langote wrote: What is an overflow partition and why do we want that? That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go. VALUES clause of the definition for such a partition would look like: (a range partition) ... VALUES LESS THAN MAXVALUE (a list partition) ... VALUES DEFAULT There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate an error if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?) If we are going to do this, should the data just go into the parent? That's what would happen today. FWIW, I think an overflow would be useful, but there should be a way to (dis|en)able it. What are you going to do if the partitioning key has two columns of different data types? Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. They are one of the most crucial elements of the scheme. I wonder if your suggestion of pg_node_tree plays well here. This then could be a list of CONSTs or some such... And I am thinking it's a concern only for range partitions, no? (that is, a multicolumn partition key) I think partkind switches the interpretation of the field as appropriate. Am I missing something? By the way, I had mentioned we could have two values fields each for range and list partition kind. The more SQL way would be records (composite types). That would make catalog inspection a LOT easier and presumably make it easier to change the partitioning key (I'm assuming ALTER TYPE cascades to stored data). Records are stored internally as tuples; not sure if that would be faster than a List of Consts or a pg_node_tree. Nodes would theoretically allow using things other than Consts, but I suspect that would be a bad idea. Something else to consider... our user-space support for ranges is now rangetypes, so perhaps that's what we should use for range partitioning. The up-side (which would be a double-edged sword) is that you could leave holes in your partitioning map. Note that in the multi-key case we could still have a record of rangetypes. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On 3 December 2014 at 08:13, Robert Haas robertmh...@gmail.com wrote: On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bottom line, given all the restrictions on whether the optimization can happen, I have very little enthusiasm for the whole idea. I do not think the benefit will be big enough to justify the amount of mess this will introduce. This optimization applies to a tremendous number of real-world cases, and we really need to have it. This was a huge problem for me in my previous life as a web developer. The previous work that we did to remove LEFT JOINs was an enormous help, but it's not enough; we need a way to remove INNER JOINs as well. I thought that David's original approach of doing this in the planner was a good one. That fell down because of the possibility that apparently-valid referential integrity constraints might not be valid at execution time if the triggers were deferred. But frankly, that seems like an awfully nitpicky thing for this to fall down on. Lots of web applications are going to issue only SELECT statements that run as as single-statement transactions, and so that issue, so troubling in theory, will never occur in practice. That doesn't mean that we don't need to account for it somehow to make the code safe, but any argument that it abridges the use case significantly is, in my opinion, not credible. Anyway, David was undeterred by the rejection of that initial approach and rearranged everything, based on suggestions from Andres and later Simon, into the form it's reached now. Kudos to him for his persistance. But your point that we might have chosen a whole different plan if it had known that this join was cheaper is a good one. However, that takes us right back to square one, which is to do this at plan time. I happen to think that's probably better anyway, but I fear we're just going around in circles here. We can either do it at plan time and find some way of handling the fact that there might be deferred triggers that haven't fired yet; or we can do it at execution time and live with the fact that we might have chosen a plan that is not optimal, though still better than executing a completely-unnecessary join. Just so that I don't end up going around in circles again, let me summarise my understanding of the pros and cons of each of the states that this patch has been in. *** Method 1: Removing Inner Joins at planning time: Pros: 1. Plan generated should be optimal, i.e should generate the same plan for the query as if the removed relations were never included in the query's text. 2. On successful join removal planning likely will be faster as there's less paths to consider having fewer relations and join combinations. Cons: 1. Assumptions must be made during planning about the trigger queue being empty or not. During execution, if there are pending fk triggers which need to be executed then we could produce wrong results. *** Method 2: Marking scans as possibly skippable during planning, and skipping joins at execution (Andres' method) Pros: 1. The plan can be executed as normal if there are any foreign key triggers pending. Cons: 1. Planner may not generate optimal plan. e.g sort nodes may be useless for Merge joins 2. Code needed to be added to all join methods to allow skipping, nested loop joins suffered from a small overhead. 3. Small overhead from visiting extra nodes in the plan which would not be present if those nodes had been removed. 4. Problems writing regression tests due to having to use EXPLAIN ANALYZE to try to work out what's going on, and the output containing variable runtime values. *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) Pros: 1. The plan can be executed as normal if there are any foreign key triggers pending. 2. Does not require extra code in all join types (see cons #2 above) 3. Does not suffer from extra node visiting overhead (see cons #3 above) Cons: 1. Executor must modify the plan. 2. Planner may have generated a plan which is not optimal for modification by the executor (e.g. Sort nodes for merge join, or index scans for pre-sorted input won't become seqscans which may be more efficient as ordering may not be required after removing a merge join) With each of the methods listed above, someone has had a problem with, and from the feedback given I've made changes based and ended up with the next revision of the patch. Tom has now pointed out that he does not like the executor modifying the plan, which I agree with to an extent as it I really do hate the extra useless nodes that I'm unable to remove from the plan. I'd like to propose Method 4 which I believe solves quite a few of the problems seen in the other method. Method 4: (Which is I think what Mart had in mind, I've only expanded on it a bit with thoughts about possible implementations methods) 1.
Re: [HACKERS] Sequence Access Method WIP
On 12/02/2014 08:21 PM, Andres Freund wrote: [snip] 2. Instead of the single amdata field, make it possible for the implementation to define any number of fields with any datatype in the tuple. That would make debugging, monitoring etc. easier. My main problem with that approach is that it pretty much nails the door shut for moving sequences into a catalog table instead of the current, pretty insane, approach of a physical file per sequence. Hmm... having done my fair bit of testing, I can say that this isn't actually that bad (though depends heavily on the underlying filesystem and workload, of course) With this approach, I fear extreme I/O contention with an update-heavy workload... unless all sequence activity is finally WAL-logged and hence writes to the actual files become mostly sequential and asynchronous. May I possibly suggest a file-per-schema model instead? This approach would certainly solve the excessive i-node consumption problem that --I guess-- Andres is trying to address here. Moreover, the one file per schema for sequences solution would fit a quite common model of grouping tables (in schemas) for physical [tablespace] location purposes Currently, with our without seqam, it'd not be all that hard to force it into a catalog, taking care to to force each tuple onto a separate page... IMHO, this is jst as wasteful as the current approach (one-page file per sequence) in terms of disk usage and complicates the code a bit but I really don't see how we can have more than one sequence state per page without severe (page) locking problems. However, someone with deeper knowledge of page pinning and buffer manager internals could certainly devise a better solution... Just my 2c Thanks, / J.L. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote: On 12/02/2014 08:21 PM, Andres Freund wrote: [snip] 2. Instead of the single amdata field, make it possible for the implementation to define any number of fields with any datatype in the tuple. That would make debugging, monitoring etc. easier. My main problem with that approach is that it pretty much nails the door shut for moving sequences into a catalog table instead of the current, pretty insane, approach of a physical file per sequence. Hmm... having done my fair bit of testing, I can say that this isn't actually that bad (though depends heavily on the underlying filesystem and workload, of course) With this approach, I fear extreme I/O contention with an update-heavy workload... unless all sequence activity is finally WAL-logged and hence writes to the actual files become mostly sequential and asynchronous. I don't think the WAL logging would need to change much in comparison to the current solution. We'd just add the page number to the WAL record. The biggest advantage would be to require fewer heavyweight locks, because the pg_sequence one could be a single fastpath lock. Currently we have to take the sequence's relation lock (heavyweight) and then the the page level lock (lwlock) for every single sequence used. May I possibly suggest a file-per-schema model instead? This approach would certainly solve the excessive i-node consumption problem that --I guess-- Andres is trying to address here. I don't think that really has any advantages. Currently, with our without seqam, it'd not be all that hard to force it into a catalog, taking care to to force each tuple onto a separate page... IMHO, this is jst as wasteful as the current approach (one-page file per sequence) in terms of disk usage and complicates the code a bit but I really don't see how we can have more than one sequence state per page without severe (page) locking problems. The overhead of a file is much more than wasting the remainder of a page. Alone the requirement of separate fsyncs and everything is pretty bothersome. The generated IO patterns are also much worse... However, someone with deeper knowledge of page pinning and buffer manager internals could certainly devise a better solution... I think there's pretty much no chance of accepting more than one page per Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/28 18:14), Ashutosh Bapat wrote: On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: Apart from the above, I noticed that the patch doesn't consider to call ExplainForeignModify during EXPLAIN for an inherited UPDATE/DELETE, as shown below (note that there are no UPDATE remote queries displayed): So, I'd like to modify explain.c to show those queries like this: postgres=# explain verbose update parent set a = a * 2 where a = 5; QUERY PLAN --__ --__- Update on public.parent (cost=0.00..280.77 rows=25 width=10) - Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=10) Output: (parent.a * 2), parent.ctid Filter: (parent.a = 5) Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (ft1.a * 2), ft1.ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft2 (cost=100.00..140.38 rows=12 width=10) Output: (ft2.a * 2), ft2.ctid Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE (12 rows) Two remote SQL under a single node would be confusing. Also the node is labelled as Foreign Scan. It would be confusing to show an UPDATE command under this scan node. I thought this as an extention of the existing (ie, non-inherited) case (see the below example) to the inherited case. postgres=# explain verbose update ft1 set a = a * 2 where a = 5; QUERY PLAN - Update on public.ft1 (cost=100.00..140.38 rows=12 width=10) Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (a * 2), ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE (5 rows) I think we should show update commands somewhere for the inherited case as for the non-inherited case. Alternatives to this are welcome. This is not exactly extension of non-inheritance case. non-inheritance case doesn't show two remote SQLs under the same plan node. May be you can rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or something to that effect) for the DML command and the Foreign plan node should be renamed to Foreign access node or something to indicate that it does both the scan as well as DML. I am not keen about the actual terminology, but I think a reader of plan shouldn't get confused. We can leave this for committer's judgement. BTW, I was experimenting with DMLs being executed on multiple FDW server under same transaction and found that the transactions may not be atomic (and may be inconsistent), if one or more of the server fails to commit while rest of them commit the transaction. The reason for this is, we do not rollback the already committed changes to the foreign server, if one or more of them fail to commit a transaction. With foreign tables under inheritance hierarchy a single DML can span across multiple servers and the result may not be atomic (and may be inconsistent). So, IIUC, even the transactions over the local and the *single* remote server are not guaranteed to be executed atomically in the current form. It is possible that the remote transaction succeeds and the local one fails, for example, resulting in data inconsistency between the local and the remote. IIUC, while committing transactions involving a single remote server, the steps taken are as follows 1. the local changes are brought to PRE-COMMIT stage, which means that the transaction *will* succeed locally after successful completion of this phase, 2. COMMIT message is sent to the foreign server 3. If step two succeeds, local changes are committed and successful commit is conveyed to the client 4. if step two fails, local changes are rolled back and abort status is conveyed to the client 5. If step 1 itself fails, the remote changes are rolled back. This is as per one phase commit protocol which guarantees ACID for single foreign data source. So, the changes involving local and a single foreign server seem to be atomic and consistent. either we have to disable DMLs on an inheritance hierarchy which spans multiple servers. OR make sure that such transactions follow 2PC norms. -1 for disabling update queries on such an inheritance hierarchy because I think we
Re: [HACKERS] inherit support for foreign tables
On Wed, Dec 3, 2014 at 4:05 PM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/28 18:14), Ashutosh Bapat wrote: On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: Apart from the above, I noticed that the patch doesn't consider to call ExplainForeignModify during EXPLAIN for an inherited UPDATE/DELETE, as shown below (note that there are no UPDATE remote queries displayed): So, I'd like to modify explain.c to show those queries like this: postgres=# explain verbose update parent set a = a * 2 where a = 5; QUERY PLAN --__ --__- Update on public.parent (cost=0.00..280.77 rows=25 width=10) - Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=10) Output: (parent.a * 2), parent.ctid Filter: (parent.a = 5) Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (ft1.a * 2), ft1.ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft2 (cost=100.00..140.38 rows=12 width=10) Output: (ft2.a * 2), ft2.ctid Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE (12 rows) Two remote SQL under a single node would be confusing. Also the node is labelled as Foreign Scan. It would be confusing to show an UPDATE command under this scan node. I thought this as an extention of the existing (ie, non-inherited) case (see the below example) to the inherited case. postgres=# explain verbose update ft1 set a = a * 2 where a = 5; QUERY PLAN - Update on public.ft1 (cost=100.00..140.38 rows=12 width=10) Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (a * 2), ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE (5 rows) I think we should show update commands somewhere for the inherited case as for the non-inherited case. Alternatives to this are welcome. This is not exactly extension of non-inheritance case. non-inheritance case doesn't show two remote SQLs under the same plan node. May be you can rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or something to that effect) for the DML command and the Foreign plan node should be renamed to Foreign access node or something to indicate that it does both the scan as well as DML. I am not keen about the actual terminology, but I think a reader of plan shouldn't get confused. We can leave this for committer's judgement. BTW, I was experimenting with DMLs being executed on multiple FDW server under same transaction and found that the transactions may not be atomic (and may be inconsistent), if one or more of the server fails to commit while rest of them commit the transaction. The reason for this is, we do not rollback the already committed changes to the foreign server, if one or more of them fail to commit a transaction. With foreign tables under inheritance hierarchy a single DML can span across multiple servers and the result may not be atomic (and may be inconsistent). So, IIUC, even the transactions over the local and the *single* remote server are not guaranteed to be executed atomically in the current form. It is possible that the remote transaction succeeds and the local one fails, for example, resulting in data inconsistency between the local and the remote. IIUC, while committing transactions involving a single remote server, the steps taken are as follows 1. the local changes are brought to PRE-COMMIT stage, which means that the transaction *will* succeed locally after successful completion of this phase, 2. COMMIT message is sent to the foreign server 3. If step two succeeds, local changes are committed and successful commit is conveyed to the client 4. if step two fails, local changes are rolled back and abort status is conveyed to the client 5. If step 1 itself fails, the remote changes are rolled back. This is as per one phase commit protocol which guarantees ACID for single foreign data source. So, the changes involving local and a single foreign server seem to be atomic and consistent. either we have to disable DMLs on an inheritance hierarchy which spans multiple servers. OR make sure that such
Re: [HACKERS] Removing INNER JOINs
On 3 December 2014 at 09:29, David Rowley dgrowle...@gmail.com wrote: *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) Pros: 1. The plan can be executed as normal if there are any foreign key triggers pending. 2. Does not require extra code in all join types (see cons #2 above) 3. Does not suffer from extra node visiting overhead (see cons #3 above) Cons: 1. Executor must modify the plan. 2. Planner may have generated a plan which is not optimal for modification by the executor (e.g. Sort nodes for merge join, or index scans for pre-sorted input won't become seqscans which may be more efficient as ordering may not be required after removing a merge join) With each of the methods listed above, someone has had a problem with, and from the feedback given I've made changes based and ended up with the next revision of the patch. Tom has now pointed out that he does not like the executor modifying the plan, which I agree with to an extent as it I really do hate the extra useless nodes that I'm unable to remove from the plan. I guess we need an Option node. Tom and I discussed that about an aeon ago. The Option node has a plan for each situation. At execution time, we make the test specified in the plan and then select the appropriate subplan. That way we can see what is happening in the plan and the executor doesn't need to edit anything. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 5:00 PM, Simon Riggs si...@2ndquadrant.com wrote: On 3 December 2014 at 09:29, David Rowley dgrowle...@gmail.com wrote: *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) Pros: 1. The plan can be executed as normal if there are any foreign key triggers pending. 2. Does not require extra code in all join types (see cons #2 above) 3. Does not suffer from extra node visiting overhead (see cons #3 above) Cons: 1. Executor must modify the plan. 2. Planner may have generated a plan which is not optimal for modification by the executor (e.g. Sort nodes for merge join, or index scans for pre-sorted input won't become seqscans which may be more efficient as ordering may not be required after removing a merge join) With each of the methods listed above, someone has had a problem with, and from the feedback given I've made changes based and ended up with the next revision of the patch. Tom has now pointed out that he does not like the executor modifying the plan, which I agree with to an extent as it I really do hate the extra useless nodes that I'm unable to remove from the plan. I guess we need an Option node. Tom and I discussed that about an aeon ago. The Option node has a plan for each situation. At execution time, we make the test specified in the plan and then select the appropriate subplan. That way we can see what is happening in the plan and the executor doesn't need to edit anything. So the planner keeps all possibility satisfying plans, or it looks at the possible conditions (like presence of foreign key for this case, for eg) and then lets executor choose between them? So is the idea essentially making the planner return a set of best plans, one for each condition? Are we assured of their optimality at the local level i.e. at each possibility? IMO this sounds like punting the planner's task to executor. Not to mention some overhead for maintaining various plans that might have been discarded early in the planning and path cost evaluation phase (consider a path with pathkeys specified, like with ORDINALITY. Can there be edge cases where we might end up invalidating the entire path if we let executor modify it, or, maybe just lose the ordinality optimization?) I agree that executor should not modify plans, but letting executor choose the plan to execute (out of a set from planner, of course) rather than planner giving executor a single plan and executor not caring about the semantics, seems a bit counterintuitive to me. It might be just me though. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] About xmllint checking for the validity of postgres.xml in 9.5
Michael Paquier wrote: Hi all, Since commit 5d93ce2d, the output of xmllint is checked by passing --valid to it. Isn't that a regression with what we were doing for pre-9.4 versions? For example, with 9.4 and older versions it is possible to compile man pages even if the xml spec is not entirely valid when using docbook 4.2. I don't think this is a regression. It just means we're stricter than before. Is there a reason behind this tinkering? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix whitespace
Peter Eisentraut wrote: Fix whitespace Does this mean you're happy with it now? Does make check-world pass for you? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Amit Langote wrote: From: Robert Haas [mailto:robertmh...@gmail.com] What is an overflow partition and why do we want that? That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go. VALUES clause of the definition for such a partition would look like: (a range partition) ... VALUES LESS THAN MAXVALUE (a list partition) ... VALUES DEFAULT There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate an error if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?) In my design I initially had overflow partitions too, because I inherited the idea from Itagaki Takahiro's patch. Eventually I realized that it's a useless concept, because you can always have leftmost and rightmost partitions, which are just regular partitions (except they don't have a low key, resp. high key). If you don't define unbounded partitions at either side, it's fine, you just raise an error whenever the user tries to insert a value for which there is no partition. Not real clear to me how this applies to list partitioning, but I have the hunch that it'd be better to deal with that without overflow partitions as well. BTW I think auto-creating partitions is a bad idea in general, because you get into lock escalation mess and furthermore you have to waste time checking for existance beforehand, which lowers performance. Just have a very easy command that users can run ahead of time (something like CREATE PARTITION FOR VALUE now() + '30 days', whatever), and preferrably one that doesn't fail if the partition already exist; that way, users can have (for instance) a daily create-30-partitions-ahead procedure which most days would only create one partition (the one for 30 days in the future) but whenever the odd case happens that the server is turned off just at that time someday, it creates two -- one belt, 29 suspenders. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Many processes blocked at ProcArrayLock
Xiaoyulei wrote: I put all the stack in attachment. Not sure that this is really all that useful. At least I don't have the patience to examine all this, and I'm not sure it contains the needed info in the first place. If you were to ensure your build is using -fno-omit-frame-pointer in cflags and then used perf record -a -g while the test runs and perf report -g once it's finished, you'd get a useful profile that would show who is acquiring the problematic lock and why. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About xmllint checking for the validity of postgres.xml in 9.5
On Wed, Dec 3, 2014 at 9:43 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: Hi all, Since commit 5d93ce2d, the output of xmllint is checked by passing --valid to it. Isn't that a regression with what we were doing for pre-9.4 versions? For example, with 9.4 and older versions it is possible to compile man pages even if the xml spec is not entirely valid when using docbook 4.2. I don't think this is a regression. It just means we're stricter than before. Is there a reason behind this tinkering? Just got surprised by how we got strict on master when doing a build of the docs using docbook 4.2 and some old versions of docbook-dsssl and docbook-xsl: man pages can still compile even if the spec is not exactly correct sometimes. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 02/12/14 20:21, Andres Freund wrote: On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote: To be clear: I don't think this API is very good for its stated purpose, for implementing global sequences for use in a cluster. For the reasons I've mentioned before. I'd like to see two changes to this proposal: ... 2. Instead of the single amdata field, make it possible for the implementation to define any number of fields with any datatype in the tuple. That would make debugging, monitoring etc. easier. My main problem with that approach is that it pretty much nails the door shut for moving sequences into a catalog table instead of the current, pretty insane, approach of a physical file per sequence. Currently, with our without seqam, it'd not be all that hard to force it into a catalog, taking care to to force each tuple onto a separate page... I don't know, I think if we decide to change storage format we can do serialization/conversion in seqam layer, it does not seem to matter too much if the serialization into some opaque type is done in AM itself or by the API layer. Or we can have one relation for all sequences in single AM, etc. In general I don't think that the custom columns for AM approach prohibits future storage changes. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Wed, Dec 03, 2014 at 10:00:26AM -0300, Alvaro Herrera wrote: Amit Langote wrote: From: Robert Haas [mailto:robertmh...@gmail.com] What is an overflow partition and why do we want that? That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go. VALUES clause of the definition for such a partition would look like: (a range partition) ... VALUES LESS THAN MAXVALUE (a list partition) ... VALUES DEFAULT There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate an error if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?) In my design I initially had overflow partitions too, because I inherited the idea from Itagaki Takahiro's patch. Eventually I realized that it's a useless concept, because you can always have leftmost and rightmost partitions, which are just regular partitions (except they don't have a low key, resp. high key). If you don't define unbounded partitions at either side, it's fine, you just raise an error whenever the user tries to insert a value for which there is no partition. Hi, Maybe I am not clear on the concept of an overflow partition, but I thought that it functioned to catch any record that did not fit the partitioning scheme. You end of range with out a low key or high key would only catch problems in those areas. If you partitioned on work days of the week, you should not have anything on Saturday/Sunday. How would that work? You would want to catch anything that was not a weekday in the overflow. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan p...@heroku.com writes: On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote: Right, and what I'm saying is that maybe the applicability flag shouldn't be stored in the SortSupport object, but passed down as an argument. But then how does that information get to any given sortsupport routine? That's the place that really needs to know if abbreviation is useful. In general, they're only passed a SortSupport object. Are you suggesting revising the signature required of SortSupport routines to add that extra flag as an additional argument? I think that is what he's suggesting, and I too am wondering why it's a good idea. I find it somewhat confusing that we've got one flag which is only used from the time the SortSupport object is created until the time that it's fully initialized, and then a different way of indicating whether we paid attention to that flag. I'm not totally sure what the right solution to that problem is, but the current situation feels like something of a wart. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 12/03/2014 11:24 AM, Andres Freund wrote: On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote: snip] I don't think the WAL logging would need to change much in comparison to the current solution. We'd just add the page number to the WAL record. The biggest advantage would be to require fewer heavyweight locks, because the pg_sequence one could be a single fastpath lock. Currently we have to take the sequence's relation lock (heavyweight) and then the the page level lock (lwlock) for every single sequence used. Got it, thank you for the explanation. May I possibly suggest a file-per-schema model instead? This approach would certainly solve the excessive i-node consumption problem that --I guess-- Andres is trying to address here. I don't think that really has any advantages. Just spreading the I/O load, nothing more, it seems: Just to elaborate a bit on the reasoning, for completeness' sake: Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k sequences per relation segment. Arguably, not many real use cases will have that many sequences save for *massively* multi-tenant databases. The downside being that all that random I/O --- in general, it can't really be sequential unless there are very very few sequences--- can't be spread to other spindles. Create a sequence_default_tablespace GUC + ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe? (I could take a shot at the patch, if deemed worthwhile) [snip] The overhead of a file is much more than wasting the remainder of a page. Alone the requirement of separate fsyncs and everything is pretty bothersome. The generated IO patterns are also much worse... Yes, you are right. I stand corrected. [snip] I think there's pretty much no chance of accepting more than one page per sequence Definitively. Thanks, J.L. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
Pushed with some extra cosmetic tweaks. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 03/12/14 15:54, Alvaro Herrera wrote: Pushed with some extra cosmetic tweaks. Cool, thanks! -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
* Atri Sharma (atri.j...@gmail.com) wrote: So the planner keeps all possibility satisfying plans, or it looks at the possible conditions (like presence of foreign key for this case, for eg) and then lets executor choose between them? Right, this was one of the thoughts that I had. So is the idea essentially making the planner return a set of best plans, one for each condition? Are we assured of their optimality at the local level i.e. at each possibility? We *already* have an idea of there being multiple plans (see plancache.c). IMO this sounds like punting the planner's task to executor. Not to mention some overhead for maintaining various plans that might have been discarded early in the planning and path cost evaluation phase (consider a path with pathkeys specified, like with ORDINALITY. Can there be edge cases where we might end up invalidating the entire path if we let executor modify it, or, maybe just lose the ordinality optimization?) The executor isn't modifying the plan, it's just picking one based on what the current situation is (which is information that only the executor can have, such as if there are pending deferred triggers). I agree that executor should not modify plans, but letting executor choose the plan to execute (out of a set from planner, of course) rather than planner giving executor a single plan and executor not caring about the semantics, seems a bit counterintuitive to me. It might be just me though. I don't think it follows that the executor is now required to care about semantics. The planner says use plan A if X is true; use plan B is X is not true and then the executor does exactly that. There's nothing about the plans provided by the planner which are being changed and there is no re-planning going on (though, as I point out, we actually *do* re-plan in cases where we think the new plan is much much better than the prior plan..). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removing INNER JOINs
* Stephen Frost (sfr...@snowman.net) wrote: * Atri Sharma (atri.j...@gmail.com) wrote: So the planner keeps all possibility satisfying plans, or it looks at the possible conditions (like presence of foreign key for this case, for eg) and then lets executor choose between them? Right, this was one of the thoughts that I had. Erm, I had also. Don't mean to imply that it was all my idea or something silly like that. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [HACKERS] On partitioning
maybe vertica's approach will be a useful example http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/AdministratorsGuide/Partitions/PartitioningTables.htm http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CREATETABLE.htm ... [ PARTITION BY partition-clause ] -- Mikhail
Re: [HACKERS] Using pg_rewind for differential backup
On Wed, Dec 3, 2014 at 1:32 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 28, 2014 at 2:49 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: It also would be quite straightforward to write a separate tool to do just that. Would be better than conflating pg_rewind with this. You could use pg_rewind as the basis for it - it's under the same license as PostgreSQL. Agree. I would want to work on that. If we had such a tool in core, would that completely solve the differential backup problem, or would more be needed? In my opinion yes. This discussion is not really adapted on hackers as pg_rewind is not included in Postgres core code. Please let's discuss your proposal there. Got it. But given the above, can we keep this discussion thread here? Best Regards, *Sameer Kumar | Database Consultant* *ASHNIK PTE. LTD.* 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533 M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com *[image: icons]* [image: Email patch] http://www.ashnik.com/ This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).
Re: [HACKERS] Removing INNER JOINs
On 2014-12-03 11:30:32 +, Simon Riggs wrote: I guess we need an Option node. Tom and I discussed that about an aeon ago. The Option node has a plan for each situation. At execution time, we make the test specified in the plan and then select the appropriate subplan. That way we can see what is happening in the plan and the executor doesn't need to edit anything. Given David's result where he noticed a performance impact due to the additional branch in the join code - which I still have a bit of a hard time to believe - it seems likely that a whole separate node that has to pass stuff around will be more expensive. I think the switch would actually have to be done in ExecInitNode() et al. David, if you essentially take your previous solution and move the if into ExecInitNode(), does it work well? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote: *** Method 1: Removing Inner Joins at planning time: *** Method 2: Marking scans as possibly skippable during planning, and skipping joins at execution (Andres' method) *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) [] a. can we invoke the planner during executor init? I'm pretty sure that we can't safely invoke the planner during executor startup, and that doing surgery on the plan tree (option #3) is unsafe also. I'm pretty clear why the latter is unsafe: it might be a copy of a data structure that's going to be reused. I am less clear on the specifics of why the former is unsafe, but what I think it boils down to is that the plan per se needs to be finalized before we begin execution; any replanning needs to be handled in the plancache code. I am not sure whether it's feasible to do something about this at the plancache layer; we have an is_oneshot flag there, so perhaps one-shot plans could simply test whether there are pending triggers, and non-oneshot plans could forego the optimization until we come up with something better. If that doesn't work for some reason, then I think we basically have to give up on the idea of replanning if the situation becomes unsafe between planning and execution. That leaves us with two alternatives. One is to create a plan incorporating the optimization and another not incorporating the optimization and decide between them at runtime, which sounds expensive. The second is to create a plan that contemplates performing the join and skip the join if it turns out to be possible, living with the fact that the resulting plan might be less than optimal - in other words, option #2. I am not sure that's all that bad. Planning is ALWAYS an exercise in predicting the future: we use statistics gathered at some point in the past, which are furthermore imprecise, to predict what will happen if we try to execute a given plan at some point in the future. Sometimes we are wrong, but that doesn't prevent us from trying to our best to predict the outcome; so here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 8:32 PM, Stephen Frost sfr...@snowman.net wrote: * Atri Sharma (atri.j...@gmail.com) wrote: So the planner keeps all possibility satisfying plans, or it looks at the possible conditions (like presence of foreign key for this case, for eg) and then lets executor choose between them? Right, this was one of the thoughts that I had. So is the idea essentially making the planner return a set of best plans, one for each condition? Are we assured of their optimality at the local level i.e. at each possibility? We *already* have an idea of there being multiple plans (see plancache.c). Thanks for pointing me there. What I am concerned about is that in this case, the option plans are competing plans rather than separate plans. My main concern is that we might be not able to discard plans that we know that are not optimal early in planning. My understanding is that planner is aggressive when discarding potential paths. Maintaining them ahead and storing and returning them might have issues, but that is only my thought. -- Regards, Atri *l'apprenant*
Re: [HACKERS] Removing INNER JOINs
On 2014-12-03 10:51:19 -0500, Robert Haas wrote: On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote: *** Method 1: Removing Inner Joins at planning time: *** Method 2: Marking scans as possibly skippable during planning, and skipping joins at execution (Andres' method) *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) [] a. can we invoke the planner during executor init? I'm pretty sure that we can't safely invoke the planner during executor startup, and that doing surgery on the plan tree (option #3) is unsafe also. I'm pretty clear why the latter is unsafe: it might be a copy of a data structure that's going to be reused. We already have a transformation between the plan and execution tree. I'm right now not seing why transforming the trees in ExecInitNode() et. al. would be unsafe - it looks fairly simple to switch between different execution plans there. Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Mon, Dec 1, 2014 at 5:34 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I made two more changes: 1. introduce newestCommitTs. Original code was using lastCommitXact to check that no future transaction is asked for, but this doesn't really work if a long-running transaction is committed, because asking for transactions with a higher Xid but which were committed earlier would raise an error. I'm kind of disappointed that, in spite of previous review comments, this got committed with extensive use of the CommitTs naming. I think that's confusing, but it's also something that will be awkward if we want to add other data, such as the much-discussed commit LSN, to the facility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-12-03 10:51:19 -0500, Robert Haas wrote: On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote: *** Method 1: Removing Inner Joins at planning time: *** Method 2: Marking scans as possibly skippable during planning, and skipping joins at execution (Andres' method) *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) [] a. can we invoke the planner during executor init? I'm pretty sure that we can't safely invoke the planner during executor startup, and that doing surgery on the plan tree (option #3) is unsafe also. I'm pretty clear why the latter is unsafe: it might be a copy of a data structure that's going to be reused. We already have a transformation between the plan and execution tree. We do? I think what we have is a plan tree, which is potentially stored in a plan cache someplace and thus must be read-only, and a planstate tree, which contains the stuff that is for this specific execution. There's probably some freedom to do exciting things in the planstate nodes, but I don't think you can tinker with the plan itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
* k...@rice.edu (k...@rice.edu) wrote: On Wed, Dec 03, 2014 at 10:00:26AM -0300, Alvaro Herrera wrote: In my design I initially had overflow partitions too, because I inherited the idea from Itagaki Takahiro's patch. Eventually I realized that it's a useless concept, because you can always have leftmost and rightmost partitions, which are just regular partitions (except they don't have a low key, resp. high key). If you don't define unbounded partitions at either side, it's fine, you just raise an error whenever the user tries to insert a value for which there is no partition. Maybe I am not clear on the concept of an overflow partition, but I thought that it functioned to catch any record that did not fit the partitioning scheme. You end of range with out a low key or high key would only catch problems in those areas. If you partitioned on work days of the week, you should not have anything on Saturday/Sunday. How would that work? You would want to catch anything that was not a weekday in the overflow. Yeah, I'm not a big fan of just dropping data on the floor either. That's the perview of CHECK constraints and shouldn't be a factor of the partitioning system, imv. There is a flip side to this though, which is that users who have those CHECK constraints probably don't want to be bothered by having to have an overflow partition, which leads into the question of, if we have them as a supported capability, what would the default be? My gut feeling is that the default should be 'no overflow', in which case I'm not sure it's useful as it won't be there for these cases where strange data shows up unexpectedly and the system wants to put it somewhere. Supporting overflow partitions would also mean supporting the ability to move data out of those partitions and into 'real' partitions which the user creates to deal with the odd/new data. That doesn't strike me as being too much fun for us to have to figure out, though if we do, we might be able to do a better job (with less blocking happening, etc) than the user could. Lastly, my inclination is that it's a capability which could be added later if there is demand for it, so perhaps the best answer is to not include it now (feature creep and all that). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removing INNER JOINs
On 2014-12-03 11:11:49 -0500, Robert Haas wrote: On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-12-03 10:51:19 -0500, Robert Haas wrote: On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote: *** Method 1: Removing Inner Joins at planning time: *** Method 2: Marking scans as possibly skippable during planning, and skipping joins at execution (Andres' method) *** Method 3: Marking scans as possibly skippable during planning and removing redundant join nodes at executor startup (Simon's method) [] a. can we invoke the planner during executor init? I'm pretty sure that we can't safely invoke the planner during executor startup, and that doing surgery on the plan tree (option #3) is unsafe also. I'm pretty clear why the latter is unsafe: it might be a copy of a data structure that's going to be reused. We already have a transformation between the plan and execution tree. We do? I think what we have is a plan tree, which is potentially stored in a plan cache someplace and thus must be read-only, and a planstate tree, which contains the stuff that is for this specific execution. There's probably some freedom to do exciting things in the planstate nodes, but I don't think you can tinker with the plan itself. Well, the planstate tree is what determines the execution, right? I don't see what would stop us from doing something like replacing: PlanState * ExecInitNode(Plan *node, EState *estate, int eflags) { ... case T_NestLoop: result = (PlanState *) ExecInitNestLoop((NestLoop *) node, estate, eflags); by case T_NestLoop: if (JoinCanBeSkipped(node)) result = NonSkippedJoinNode(node); else result = (PlanState *) ExecInitNestLoop((NestLoop *) node, estate, eflags); Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic from David's early patch where he put the logic entirely into the actual execution phase. We'd probably want to move the join nodes into a separate ExecInitJoin() function and do the JoinCanBeSkipped() and NonSkippedJoin() node in the generic code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
* Atri Sharma (atri.j...@gmail.com) wrote: What I am concerned about is that in this case, the option plans are competing plans rather than separate plans. Not sure I follow this thought entirely.. The plans in the plancache are competeing, but separate, plans. My main concern is that we might be not able to discard plans that we know that are not optimal early in planning. My understanding is that planner is aggressive when discarding potential paths. Maintaining them ahead and storing and returning them might have issues, but that is only my thought. The planner is aggressive at discarding potential paths, but this is all a consideration for how expensive this particular optimization is, not an issue with the approach itself. We certainly don't want an optimization that doubles the time for 100% of queries planned but only saves time in 5% of the cases, but if we can spend an extra 5% of the time required for planning in the 1% of cases where the optimization could possibly happen to save a huge amount of time for those queries, then it's something to consider. We would definitely want to spend as little time as possible checking for this optimization in cases where it isn't possible to use the optimization. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote: Well, the planstate tree is what determines the execution, right? I don't see what would stop us from doing something like replacing: PlanState * ExecInitNode(Plan *node, EState *estate, int eflags) { ... case T_NestLoop: result = (PlanState *) ExecInitNestLoop((NestLoop *) node, estate, eflags); by case T_NestLoop: if (JoinCanBeSkipped(node)) result = NonSkippedJoinNode(node); else result = (PlanState *) ExecInitNestLoop((NestLoop *) node, estate, eflags); Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic from David's early patch where he put the logic entirely into the actual execution phase. Yeah, maybe. I think there's sort of a coding principle that the plan and planstate trees should match up one-to-one, but it's possible that nothing breaks if they don't, or that I've misunderstood the coding rule in the first instance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
Robert Haas robertmh...@gmail.com writes: On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote: Well, the planstate tree is what determines the execution, right? I don't see what would stop us from doing something like replacing: PlanState * ExecInitNode(Plan *node, EState *estate, int eflags) { ... case T_NestLoop: result = (PlanState *) ExecInitNestLoop((NestLoop *) node, estate, eflags); by case T_NestLoop: if (JoinCanBeSkipped(node)) result = NonSkippedJoinNode(node); else result = (PlanState *) ExecInitNestLoop((NestLoop *) node, estate, eflags); Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic from David's early patch where he put the logic entirely into the actual execution phase. Yeah, maybe. I think there's sort of a coding principle that the plan and planstate trees should match up one-to-one, but it's possible that nothing breaks if they don't, or that I've misunderstood the coding rule in the first instance. Far better would be what I mentioned upthread: an explicit switch node in the plan tree, analogous to the existing AlternativeSubPlan structure. ChooseJoinSubPlan - plan tree requiring all tables to be joined - plan tree not requiring all tables to be joined This allows sensible display by EXPLAIN and avoids the need for the core executor code to be dirtied with implementation of the precise switch rule: all that logic goes into the ChooseJoinSubPlan plan node code. I would envision the planner starting out generating the first subplan (without the optimization), but as it goes along, noting whether there are any opportunities for join removal. At the end, if it found that there were such opportunities, re-plan assuming that removal is possible. Then stick a switch node on top. This would give optimal plans for both cases, and it would avoid the need for lots of extra planner cycles when the optimization can't be applied ... except for one small detail, which is that the planner has a bad habit of scribbling on its own input. I'm not sure how much cleanup work would be needed before that re-plan operation could happen as easily as is suggested above. But in principle this could be made to work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would envision the planner starting out generating the first subplan (without the optimization), but as it goes along, noting whether there are any opportunities for join removal. At the end, if it found that there were such opportunities, re-plan assuming that removal is possible. Then stick a switch node on top. This would give optimal plans for both cases, and it would avoid the need for lots of extra planner cycles when the optimization can't be applied ... except for one small detail, which is that the planner has a bad habit of scribbling on its own input. I'm not sure how much cleanup work would be needed before that re-plan operation could happen as easily as is suggested above. But in principle this could be made to work. Doesn't this double the planning overhead, in most cases for no benefit? The alternative plan used only when there are deferred triggers is rarely going to get used. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
Robert Haas robertmh...@gmail.com writes: On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would envision the planner starting out generating the first subplan (without the optimization), but as it goes along, noting whether there are any opportunities for join removal. At the end, if it found that there were such opportunities, re-plan assuming that removal is possible. Then stick a switch node on top. This would give optimal plans for both cases, and it would avoid the need for lots of extra planner cycles when the optimization can't be applied ... except for one small detail, which is that the planner has a bad habit of scribbling on its own input. I'm not sure how much cleanup work would be needed before that re-plan operation could happen as easily as is suggested above. But in principle this could be made to work. Doesn't this double the planning overhead, in most cases for no benefit? The alternative plan used only when there are deferred triggers is rarely going to get used. Personally, I remain of the opinion that this optimization will apply in only a tiny fraction of real-world cases, so I'm mostly concerned about not blowing out planning time when the optimization doesn't apply. However, even granting that that is a concern, so what? You *have* to do the planning twice, or you're going to be generating a crap plan for one case or the other. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
* Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would envision the planner starting out generating the first subplan (without the optimization), but as it goes along, noting whether there are any opportunities for join removal. At the end, if it found that there were such opportunities, re-plan assuming that removal is possible. Then stick a switch node on top. This would give optimal plans for both cases, and it would avoid the need for lots of extra planner cycles when the optimization can't be applied ... except for one small detail, which is that the planner has a bad habit of scribbling on its own input. I'm not sure how much cleanup work would be needed before that re-plan operation could happen as easily as is suggested above. But in principle this could be made to work. Doesn't this double the planning overhead, in most cases for no benefit? The alternative plan used only when there are deferred triggers is rarely going to get used. Personally, I remain of the opinion that this optimization will apply in only a tiny fraction of real-world cases, so I'm mostly concerned about not blowing out planning time when the optimization doesn't apply. This was my thought also- most of the time we won't be able to apply the optimization and we'll know that pretty early on and can skip the double planning. What makes this worthwhile is that there are cases where it'll be applied regularly due to certain tools/technologies being used and the extra planning will be more than made up for by the reduction in execution time. However, even granting that that is a concern, so what? You *have* to do the planning twice, or you're going to be generating a crap plan for one case or the other. Yeah, I don't see a way around that.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removing INNER JOINs
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: However, even granting that that is a concern, so what? You *have* to do the planning twice, or you're going to be generating a crap plan for one case or the other. Yeah, I don't see a way around that.. Also, it occurs to me that it's only necessary to repeat the join search part of the process, which means that in principle the mechanisms already exist for that; see GEQO. This means that for small join problems, the total planning time would much less than double anyway. For large problems, where the join search is the bulk of the time, we could hope that removal of unnecessary joins would reduce the join search runtime enough that the second search would be pretty negligible next to the first (which is not optional). So I think it'll double the runtime is an unfounded objection, or at least there's good reason to hope it's unfounded. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 11:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: However, even granting that that is a concern, so what? You *have* to do the planning twice, or you're going to be generating a crap plan for one case or the other. Yeah, I don't see a way around that.. Also, it occurs to me that it's only necessary to repeat the join search part of the process, which means that in principle the mechanisms already exist for that; see GEQO. This means that for small join problems, the total planning time would much less than double anyway. For large problems, where the join search is the bulk of the time, we could hope that removal of unnecessary joins would reduce the join search runtime enough that the second search would be pretty negligible next to the first (which is not optional). So I think it'll double the runtime is an unfounded objection, or at least there's good reason to hope it's unfounded. Is it possible to only replan part of the plan in case of this optimization? I think that we might need to only replan parts of the original plan (as you mentioned, join search and above). So we could reuse the original plan in part and not do a lot of replanning (an obvious case is scan strategy, which we can assume will not change for the two plans). I wonder if we could have a rule based system for replacement of some plan nodes with other type of nodes. As we discover more cases like this, we could add more rules. Wild thought though. -- Regards, Atri *l'apprenant*
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 12:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: However, even granting that that is a concern, so what? You *have* to do the planning twice, or you're going to be generating a crap plan for one case or the other. Yeah, I don't see a way around that.. Also, it occurs to me that it's only necessary to repeat the join search part of the process, which means that in principle the mechanisms already exist for that; see GEQO. This means that for small join problems, the total planning time would much less than double anyway. For large problems, where the join search is the bulk of the time, we could hope that removal of unnecessary joins would reduce the join search runtime enough that the second search would be pretty negligible next to the first (which is not optional). So I think it'll double the runtime is an unfounded objection, or at least there's good reason to hope it's unfounded. OK. One other point of hope is that, in my experience, the queries where you need join removal are the ones where there are lots of tables being joined and there are often quite a few of those joins that can be removed, not just one. So the extra planner overhead might pay off anyway. (It still seems a shame to have to plan for the not-removing-the-joins case since it will so rarely happen. But maybe I should take what I can get.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] changing primary key col(s) with minimal impact
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have a need to provide a way to change a table's primary key columns, in the possible presence of foreign keys, and with concurrent use (as much as possible). The best way I have come up with is roughly: a. create the to-be-new-pk index concurrently b. demote the old pk to be a unique constraint c. alter table add constraint new pk using existing index Step b. involves (in one transaction): - -- * update pg_class row for the table relhaspkey false * update pg_constraint row for the original pk contype = 'u' conname = 'some_new_name' * update pg_index row for the original pk index indisprimary = false * alter index original pk rename to some_new_name - -- I have tested this (minimally) and as far as I can tell it works. Questions: 1) any major holes in this approach? 2) any better ideas? 3) should we consider an ALTER TABLE ALTER CONSTRAINT command (or some such) to allow demotion of a PRIMARY KEY to a UNIQUE constraint? Thanks for any thoughts/comments. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJUf0t5AAoJEDfy90M199hlUT8QAJcobqTmT0fKfBaXbIjXzJLZ ++i0JC/RwDOfnNkUjrEkv+WlQlhwHViKd04JCLWKes8EE81Vv8qUTlQDzphxeHCS OiPMKSePHzCSphYRwzGp1gurvfcw/Yv/5wQyCcUWBqpa4DYZzUFKkeoH3d2Zzd6z t6iR8cM21XArFnGMQN1gmAA0swStrm6CdzxydEIY3EoY8tgIGQBjDAEzs/v4bYPG kEECRYtNu3tiy+ejslB+WWPXd0y+Ty8idRpmgMPMLqIi0RzA8s/p4xhm1INUyr31 fqlN3vaFaPEgXdsuDEqmXt7H0QWyCIwEYelPTh3Zi8RmOOzzeZKNdjaQc6mJHrO7 JFVFMVaw/CRuogui9Q/DMalpbNWjbqoVV4JtPssGK1BBvmpJrEd7QLzFLmz3QSzn Rdb/UN8RWWAfL50MAztlSpwX/4vPbolvC7yMjg1lGvfm8g0B3qz+iHW3V0G1qX8Y mxQD3LvnMgUN/m2EUiUr+L+Eh3fEV0M3SbQCii6b+apLjVGe25pHE8zx4QufrJKk ftX70nRJPaoW/+LBj69n7r7wde3CSpI2/6qbjIXYKu/gutWQEPgxbbd4fxTGICz/ P16y2V4mwKT66Ma3vjz1gwCGcHSShLsJx4PAGFMpR3SE63kDvGE0Zm/RG5u9+z/r MIkw1EL3cFpTlu+7Jtwj =pyzW -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
Atri Sharma atri.j...@gmail.com writes: Is it possible to only replan part of the plan in case of this optimization? I think that we might need to only replan parts of the original plan (as you mentioned, join search and above). So we could reuse the original plan in part and not do a lot of replanning (an obvious case is scan strategy, which we can assume will not change for the two plans). I think you assume wrong; or at least, I certainly would not wish to hard-wire any such assumption. Skipping some joins could change the shape of the join tree *completely*, because the cost estimates will change so much. And that could in turn lead to making different choices of scan methods, eg, we might or might not care about sort order of a scan result if we change join methods. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 11:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Atri Sharma atri.j...@gmail.com writes: Is it possible to only replan part of the plan in case of this optimization? I think that we might need to only replan parts of the original plan (as you mentioned, join search and above). So we could reuse the original plan in part and not do a lot of replanning (an obvious case is scan strategy, which we can assume will not change for the two plans). I think you assume wrong; or at least, I certainly would not wish to hard-wire any such assumption. Skipping some joins could change the shape of the join tree *completely*, because the cost estimates will change so much. And that could in turn lead to making different choices of scan methods, eg, we might or might not care about sort order of a scan result if we change join methods. regards, tom lane Agreed, but in some cases, we could possibly make some assumptions (if there is no index, if a large fraction of table will be returned in scan, FunctionScan).
Re: [HACKERS] Removing INNER JOINs
* Atri Sharma (atri.j...@gmail.com) wrote: Agreed, but in some cases, we could possibly make some assumptions (if there is no index, if a large fraction of table will be returned in scan, FunctionScan). All neat ideas but how about we get something which works in the way being asked for before we start trying to optimize it..? Maybe I'm missing something, but getting all of this infrastructure into place and making sure things aren't done to the plan tree which shouldn't be (or done to all of them if necessary..) is enough that we should get that bit done first and then worry if there are ways we can further improve things.. THanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 11:27 PM, Stephen Frost sfr...@snowman.net wrote: * Atri Sharma (atri.j...@gmail.com) wrote: Agreed, but in some cases, we could possibly make some assumptions (if there is no index, if a large fraction of table will be returned in scan, FunctionScan). All neat ideas but how about we get something which works in the way being asked for before we start trying to optimize it..? Maybe I'm missing something, but getting all of this infrastructure into place and making sure things aren't done to the plan tree which shouldn't be (or done to all of them if necessary..) is enough that we should get that bit done first and then worry if there are ways we can further improve things.. Right,sorry for digressing. I think we are in agreement as to what needs to be done (start with a plan, note ideas and replan if necessary). The idea of executor modifying the plan (or personally, even choosing the plan) seems counterintuitive. Does it also make sense to recalculate the costs from scratch for the replan? It might be, I am just asking. Regards, Atri
Re: [HACKERS] Removing INNER JOINs
Stephen Frost sfr...@snowman.net writes: * Atri Sharma (atri.j...@gmail.com) wrote: Agreed, but in some cases, we could possibly make some assumptions (if there is no index, if a large fraction of table will be returned in scan, FunctionScan). All neat ideas but how about we get something which works in the way being asked for before we start trying to optimize it..? Maybe I'm missing something, but getting all of this infrastructure into place and making sure things aren't done to the plan tree which shouldn't be (or done to all of them if necessary..) is enough that we should get that bit done first and then worry if there are ways we can further improve things.. Yeah; moreover, there's no evidence that hard-wiring such assumptions would save anything. In the example of a FunctionScan, guess what: there's only one Path for that relation anyway. I think the right approach for now is to emulate the GEQO precedent as closely as possible. Build all the single-relation Paths the same as now, then do a join search over all the relations, then (if we've noticed that some joins are potentially removable) do another join search over just the nonremovable relations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
Robert Haas wrote: On Mon, Dec 1, 2014 at 5:34 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I made two more changes: 1. introduce newestCommitTs. Original code was using lastCommitXact to check that no future transaction is asked for, but this doesn't really work if a long-running transaction is committed, because asking for transactions with a higher Xid but which were committed earlier would raise an error. I'm kind of disappointed that, in spite of previous review comments, this got committed with extensive use of the CommitTs naming. I think that's confusing, but it's also something that will be awkward if we want to add other data, such as the much-discussed commit LSN, to the facility. I never saw a comment that CommitTs was an unwanted name. There were some that said that committs wasn't liked because it looked like a misspelling, so we added an underscore -- stuff in lower case is commit_ts everywhere. Stuff in camel case didn't get the underscore because it didn't seem necessary. But other than that issue, the name wasn't questioned, as far as I'm aware. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
Atri Sharma atri.j...@gmail.com writes: Does it also make sense to recalculate the costs from scratch for the replan? It might be, I am just asking. The join costs would be recalculated from scratch, yes. The single-relation Paths would already exist and their costs would not change. Again, if you've not studied how GEQO works, you probably should go do that before thinking more about how this would work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Atri Sharma (atri.j...@gmail.com) wrote: Agreed, but in some cases, we could possibly make some assumptions (if there is no index, if a large fraction of table will be returned in scan, FunctionScan). All neat ideas but how about we get something which works in the way being asked for before we start trying to optimize it..? Maybe I'm missing something, but getting all of this infrastructure into place and making sure things aren't done to the plan tree which shouldn't be (or done to all of them if necessary..) is enough that we should get that bit done first and then worry if there are ways we can further improve things.. Yeah; moreover, there's no evidence that hard-wiring such assumptions would save anything. In the example of a FunctionScan, guess what: there's only one Path for that relation anyway. That is precisely what I meant :) I guess I was being too over cautious and even trying to save the time spent in evaluating whatever paths we have and building new FunctionScan paths... I think the right approach for now is to emulate the GEQO precedent as closely as possible. Build all the single-relation Paths the same as now, then do a join search over all the relations, then (if we've noticed that some joins are potentially removable) do another join search over just the nonremovable relations. How about using geqo more liberally when replanning (decrease the number of relations in join before geqo is hit?) -- Regards, Atri *l'apprenant*
Re: [HACKERS] Removing INNER JOINs
Atri Sharma atri.j...@gmail.com writes: On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think the right approach for now is to emulate the GEQO precedent as closely as possible. Build all the single-relation Paths the same as now, then do a join search over all the relations, then (if we've noticed that some joins are potentially removable) do another join search over just the nonremovable relations. How about using geqo more liberally when replanning (decrease the number of relations in join before geqo is hit?) This is going to be quite difficult enough without overcomplicating it. Or as a wise man once said, premature optimization is the root of all evil. Get it working in the basic way and then see if improvement is necessary at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 11:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Atri Sharma atri.j...@gmail.com writes: On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think the right approach for now is to emulate the GEQO precedent as closely as possible. Build all the single-relation Paths the same as now, then do a join search over all the relations, then (if we've noticed that some joins are potentially removable) do another join search over just the nonremovable relations. How about using geqo more liberally when replanning (decrease the number of relations in join before geqo is hit?) This is going to be quite difficult enough without overcomplicating it. Or as a wise man once said, premature optimization is the root of all evil. Get it working in the basic way and then see if improvement is necessary at all. Sure, I can take a crack at it since I am working on a patch that does require this alternative path approach. Let me try something and report my experimental results.
Re: [HACKERS] Removing INNER JOINs
On 12/03/2014 07:41 PM, Robert Haas wrote: On Wed, Dec 3, 2014 at 12:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: However, even granting that that is a concern, so what? You *have* to do the planning twice, or you're going to be generating a crap plan for one case or the other. Yeah, I don't see a way around that.. Also, it occurs to me that it's only necessary to repeat the join search part of the process, which means that in principle the mechanisms already exist for that; see GEQO. This means that for small join problems, the total planning time would much less than double anyway. For large problems, where the join search is the bulk of the time, we could hope that removal of unnecessary joins would reduce the join search runtime enough that the second search would be pretty negligible next to the first (which is not optional). So I think it'll double the runtime is an unfounded objection, or at least there's good reason to hope it's unfounded. OK. One other point of hope is that, in my experience, the queries where you need join removal are the ones where there are lots of tables being joined and there are often quite a few of those joins that can be removed, not just one. So the extra planner overhead might pay off anyway. Do you need to plan for every combination, where some joins are removed and some are not? I hope the same mechanism could be used to prepare a plan for a query with parameters, where the parameters might or might not allow a partial index to be used. We have some smarts nowadays to use custom plans, but this could be better. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
Heikki Linnakangas hlinnakan...@vmware.com writes: Do you need to plan for every combination, where some joins are removed and some are not? I would vote for just having two plans and one switch node. To exploit any finer grain, we'd have to have infrastructure that would let us figure out *which* constraints pending triggers might indicate transient invalidity of, and that doesn't seem likely to be worth the trouble. I hope the same mechanism could be used to prepare a plan for a query with parameters, where the parameters might or might not allow a partial index to be used. We have some smarts nowadays to use custom plans, but this could be better. Interesting thought, but that would be a totally different switch condition ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 03, 2014 at 02:08:27PM -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: Do you need to plan for every combination, where some joins are removed and some are not? I would vote for just having two plans and one switch node. To exploit any finer grain, we'd have to have infrastructure that would let us figure out *which* constraints pending triggers might indicate transient invalidity of, and that doesn't seem likely to be worth the trouble. I hope the same mechanism could be used to prepare a plan for a query with parameters, where the parameters might or might not allow a partial index to be used. We have some smarts nowadays to use custom plans, but this could be better. Interesting thought, but that would be a totally different switch condition ... regards, tom lane Or between a node with a low rows count and a high rows count for those pesky mis-estimation queries. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
On Wed, Dec 3, 2014 at 2:09 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would envision the planner starting out generating the first subplan (without the optimization), but as it goes along, noting whether there are any opportunities for join removal. At the end, if it found that there were such opportunities, re-plan assuming that removal is possible. Then stick a switch node on top. This would give optimal plans for both cases, and it would avoid the need for lots of extra planner cycles when the optimization can't be applied ... except for one small detail, which is that the planner has a bad habit of scribbling on its own input. I'm not sure how much cleanup work would be needed before that re-plan operation could happen as easily as is suggested above. But in principle this could be made to work. Doesn't this double the planning overhead, in most cases for no benefit? The alternative plan used only when there are deferred triggers is rarely going to get used. It shouldn't. It will only double (at worst) planning overhead for the queries that do have removable joins, which would be the ones benefiting from the extra work. Whether that extra work pays off is the question to ask here. Perhaps whether or not to remove the joins could be a decision made accounting for overall plan cost and fraction of joins removed, as to avoid the extra planning if execution will be fast anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
Alvaro Herrera wrote: Robert Haas wrote: I'm kind of disappointed that, in spite of previous review comments, this got committed with extensive use of the CommitTs naming. I think that's confusing, but it's also something that will be awkward if we want to add other data, such as the much-discussed commit LSN, to the facility. I never saw a comment that CommitTs was an unwanted name. There were some that said that committs wasn't liked because it looked like a misspelling, so we added an underscore -- stuff in lower case is commit_ts everywhere. Stuff in camel case didn't get the underscore because it didn't seem necessary. But other than that issue, the name wasn't questioned, as far as I'm aware. I found one email where you said you didn't like committs and preferred commit_timestamp instead. I don't see how making that change would have made you happy wrt the concern you just expressed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Tue, Dec 2, 2014 at 1:21 PM, Peter Geoghegan p...@heroku.com wrote: Incidentally, I think that an under-appreciated possible source of regressions here is that attributes abbreviated have a strong physical/logical correlation. I could see a small regression for one such case even though my cost model indicated that it should be very profitable. This was the column in question: postgres=# select * from pg_stats where tablename = 'ohio_voters' and attname = 'mailing_address1' ; -[ RECORD 1 ]- schemaname | public tablename | ohio_voters attname| mailing_address1 inherited | f null_frac | 0 avg_width | 5 n_distinct | 789 most_common_vals | {} most_common_freqs | {0.969267} histogram_bounds | SNIP *** correlation| 0.944785 SNIP *** This n_distinct is wrong, though. In fact, the number of distinct columns is 25,946, while the number of distinct abbreviated keys is 13,691. So correlation was not the dominant factor here (although it was probably still a factor) - rather, the dominant factor was that the vast majority of comparisons would get away with an opportunistic memcmp() == 0 anyway (although not with Postgres 9.4), and so my baseline is very fast for this case. This would not have come up had the value been represented as NULL (as it clearly should have been), since that would not undergo strxfrm() transformation/abbreviation in the first place. Even still, highly skewed attributes exist in the wild, and deserve our consideration - we do not model the distribution of values within the set. I believe that these cases are rare enough, and (thanks to the already committed parts of this work) fast enough to probably not be worried about; maybe a more complex cost model could do better, but I'm inclined to think that it's not worth it. We'd end up slightly improving this case at bigger cost to other, much more common cases. Besides, equality-resolved comparisons are not necessarily much cheaper for datatypes other than Postgres 9.5 text (in a world where there is a variety of datatypes accelerated by abbreviation), which discourages a general solution. A custom format dump of this data (publicly available Ohio State voter records) is available from: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/ohio_voters.custom.dump -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Wed, Dec 3, 2014 at 2:36 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: Robert Haas wrote: I'm kind of disappointed that, in spite of previous review comments, this got committed with extensive use of the CommitTs naming. I think that's confusing, but it's also something that will be awkward if we want to add other data, such as the much-discussed commit LSN, to the facility. I never saw a comment that CommitTs was an unwanted name. There were some that said that committs wasn't liked because it looked like a misspelling, so we added an underscore -- stuff in lower case is commit_ts everywhere. Stuff in camel case didn't get the underscore because it didn't seem necessary. But other than that issue, the name wasn't questioned, as far as I'm aware. I found one email where you said you didn't like committs and preferred commit_timestamp instead. I don't see how making that change would have made you happy wrt the concern you just expressed. Fair point. I'm still not sure we got this one right, but I don't know that I want to spend more time wrangling about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgres_fdw does not see enums
Folks, I've been trying out 9.5-to-be's PostgreSQL FDW, and I noticed that it doesn't seem to handle enum types. Would this be a trivial fix? Repro: Create an enum type in a 9.3 instance. Use the type in a table. Create a foreign server pointing to this instance. Execute IMPORT FOREIGN SCHEMA. Pseudonymized output: db_compare=# IMPORT FOREIGN SCHEMA public FROM SERVER blah INTO blah; ERROR: type public.product_type does not exist LINE 4: product_type public.product_type OPTIONS (column_name 'pro... ^ QUERY: CREATE FOREIGN TABLE foo ( id integer OPTIONS (column_name 'id') NOT NULL, event_id integer OPTIONS (column_name 'event_id') NOT NULL, product_type public.product_type OPTIONS (column_name 'product_type') ) SERVER blah OPTIONS (schema_name 'public', table_name 'foo'); CONTEXT: importing foreign table foo Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw does not see enums
David Fetter da...@fetter.org writes: I've been trying out 9.5-to-be's PostgreSQL FDW, and I noticed that it doesn't seem to handle enum types. Would this be a trivial fix? No. How would you know whether the remote side even has the enum, let alone whether it has an identical set of members? I don't see that enums are noticeably easier than the general case of non-built-in types ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw does not see enums
On Wed, Dec 03, 2014 at 05:38:47PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: I've been trying out 9.5-to-be's PostgreSQL FDW, and I noticed that it doesn't seem to handle enum types. Would this be a trivial fix? No. How would you know whether the remote side even has the enum, let alone whether it has an identical set of members? I don't see that enums are noticeably easier than the general case of non-built-in types ... I must be missing something important. When querying the remote side, *and it's PostgreSQL*, we have catalog access that could be used to reconstruct the enums. Or are you thinking about the case where the enum changes from one call to the next? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw does not see enums
David Fetter da...@fetter.org writes: On Wed, Dec 03, 2014 at 05:38:47PM -0500, Tom Lane wrote: No. How would you know whether the remote side even has the enum, let alone whether it has an identical set of members? I don't see that enums are noticeably easier than the general case of non-built-in types ... I must be missing something important. When querying the remote side, *and it's PostgreSQL*, we have catalog access that could be used to reconstruct the enums. Or are you thinking about the case where the enum changes from one call to the next? What do you mean reconstruct the enum? We can't fix inconsistencies between the local enum definition and the remote definition (if any). Say the remote has a value x that we don't, it'll fail when SELECTing a row containing that value; postgres_fdw has no way to prevent such a failure. Conversely, if we have a value y that doesn't exist on the remote side, transmitting a clause enumcol = 'y' to the remote side would fail. postgres_fdw has no way to prevent that, either, save not transmitting clauses involving enums (which is exactly what it does now). I suppose we could say that if you create a foreign-table definition that includes an enum-type column, it's on your head that the enum exists and is compatibly defined on the far end. Not sure about the risk-benefit tradeoff here though. If you get it wrong (example: the two servers have different opinions about the sort order of the enum's values), you would end up with *very* hard to debug wrong results. I'm not convinced that we really want to encourage users to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw does not see enums
On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: On Wed, Dec 03, 2014 at 05:38:47PM -0500, Tom Lane wrote: No. How would you know whether the remote side even has the enum, let alone whether it has an identical set of members? I don't see that enums are noticeably easier than the general case of non-built-in types ... I must be missing something important. When querying the remote side, *and it's PostgreSQL*, we have catalog access that could be used to reconstruct the enums. Or are you thinking about the case where the enum changes from one call to the next? What do you mean reconstruct the enum? Capture its state at the time when IMPORT FOREIGN SCHEMA is executed. Right now, if you try IMPORT SCHEMA on a foreign table with an enum in it, postgresql_fdw errors out rather than trying to notice that there's an enum definition which should precede creation and execute it in the correct order. We can't fix inconsistencies between the local enum definition and the remote definition (if any). Your objection as stated applies to just about any ALTER issued on the remote side after the IMPORT FOREIGN SCHEMA has taken effect, not just to changes in enums. This is why I built functionality into DBI-Link that refreshes foreign tables. Say the remote has a value x that we don't, it'll fail when SELECTing a row containing that value; postgres_fdw has no way to prevent such a failure. Conversely, if we have a value y that doesn't exist on the remote side, transmitting a clause enumcol = 'y' to the remote side would fail. postgres_fdw has no way to prevent that, either, save not transmitting clauses involving enums (which is exactly what it does now). I suppose we could say that if you create a foreign-table definition that includes an enum-type column, it's on your head that the enum exists and is compatibly defined on the far end. We're already saying this about some substantial fraction of ALTER TABLEs that could happen on the remote side. I don't see how including enums could make it substantially worse. Not sure about the risk-benefit tradeoff here though. If you get it wrong (example: the two servers have different opinions about the sort order of the enum's values), you would end up with *very* hard to debug wrong results. I'm not convinced that we really want to encourage users to do that. Perhaps we should add some compatibility checking functions for local vs. remote tables. The first cut of these could be, are the tables defined identically up to what we've specified in the foreign server/foreign table stuff? Subtler, looser versions might follow. For example, if the foreign table definition has VARCHAR(255) and the remote table has VARCHAR(100), it's not a catastrophe. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changing primary key col(s) with minimal impact
Hi Joe, I have a need to provide a way to change a table's primary key columns, in the possible presence of foreign keys, and with concurrent use (as much as possible). The best way I have come up with is roughly: a. create the to-be-new-pk index concurrently b. demote the old pk to be a unique constraint c. alter table add constraint new pk using existing index Step b. involves (in one transaction): - -- * update pg_class row for the table relhaspkey false * update pg_constraint row for the original pk contype = 'u' conname = 'some_new_name' * update pg_index row for the original pk index indisprimary = false * alter index original pk rename to some_new_name - -- I have tested this (minimally) and as far as I can tell it works. Questions: 1) any major holes in this approach? 2) any better ideas? 3) should we consider an ALTER TABLE ALTER CONSTRAINT command (or some such) to allow demotion of a PRIMARY KEY to a UNIQUE constraint? Thanks for any thoughts/comments. Hi Joe, I already did something like it once, but to rebuild a bloated PK index with minimal locks. I still waiting for 'REINDEX CONCURRENTLY' :-) At that time I didn't have no trouble with this approach. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] postgres_fdw does not see enums
David Fetter da...@fetter.org writes: On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote: What do you mean reconstruct the enum? Capture its state at the time when IMPORT FOREIGN SCHEMA is executed. Right now, if you try IMPORT SCHEMA on a foreign table with an enum in it, postgresql_fdw errors out rather than trying to notice that there's an enum definition which should precede creation and execute it in the correct order. Oh, you think IMPORT FOREIGN SCHEMA should try to import enums? I doubt it. What happens if the enum already exists locally? And why enums, and not domains, ranges, composite types, etc? Perhaps more to the point, IMPORT FOREIGN SCHEMA is defined in the SQL standard, as are its effects, and those effects are defined as a series of CREATE FOREIGN TABLE commands. There's nothing there about trying to import types that the tables might depend on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw does not see enums
On Wed, Dec 03, 2014 at 06:17:51PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote: What do you mean reconstruct the enum? Capture its state at the time when IMPORT FOREIGN SCHEMA is executed. Right now, if you try IMPORT SCHEMA on a foreign table with an enum in it, postgresql_fdw errors out rather than trying to notice that there's an enum definition which should precede creation and execute it in the correct order. Oh, you think IMPORT FOREIGN SCHEMA should try to import enums? Yes. I doubt it. What happens if the enum already exists locally? Informative error message along the lines of, local enum foo.bar doesn't match remote enum public.bar with a suitable HINT comparing the enums' values. However, I don't see much of a use case for this because INTO SCHEMA should be specifying an empty schema, or at least one without objects in it (like ENUMs) that could clash. And why enums, and not domains, ranges, composite types, etc? You'd be assuming I think those should be excluded. ;) Perhaps more to the point, IMPORT FOREIGN SCHEMA is defined in the SQL standard, as are its effects, and those effects are defined as a series of CREATE FOREIGN TABLE commands. There's nothing there about trying to import types that the tables might depend on. The SQL standard has an awful lot of holes, this one being about the size of the Chicxulub crater. That fact doesn't force our implementation to throw up its hands when it finds a feature we've implemented and encouraged people to use. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq pipelining
Hi, The recent discussion about pipelining in the jodbc driver prompted me to look at what it would take for libpq. I have a proof of concept patch working. The results are even more promising than I expected. While it's true that many applications and frameworks won't easily benefit, it amazes me that this hasn't been explored before. I developed a simple test application that creates a table with a single auto increment primary key column, then runs a 4 simple queries x times each: INSERT INTO test() VALUES () SELECT * FROM test LIMIT 1 SELECT * FROM test DELETE FROM test The parameters to testPipelinedSeries are (number of times to execute each query, maximum number of queued queries). Results against local server: testPipelinedSeries(10,1) took 0.020884 testPipelinedSeries(10,3) took 0.020630, speedup 1.01 testPipelinedSeries(10,10) took 0.006265, speedup 3.33 testPipelinedSeries(100,1) took 0.042731 testPipelinedSeries(100,3) took 0.043035, speedup 0.99 testPipelinedSeries(100,10) took 0.037222, speedup 1.15 testPipelinedSeries(100,25) took 0.031223, speedup 1.37 testPipelinedSeries(100,50) took 0.032482, speedup 1.32 testPipelinedSeries(100,100) took 0.031356, speedup 1.36 Results against remote server through ssh tunnel(30-40ms rtt): testPipelinedSeries(10,1) took 3.2461736 testPipelinedSeries(10,3) took 1.1008443, speedup 2.44 testPipelinedSeries(10,10) took 0.342399, speedup 7.19 testPipelinedSeries(100,1) took 26.25882588 testPipelinedSeries(100,3) took 8.8509234, speedup 3.04 testPipelinedSeries(100,10) took 3.2866285, speedup 9.03 testPipelinedSeries(100,25) took 2.1472847, speedup 17.57 testPipelinedSeries(100,50) took 1.957510, speedup 27.03 testPipelinedSeries(100,100) took 0.690682, speedup 37.47 I plan to write documentation, add regression testing, and do general cleanup before asking for feedback on the patch itself. Any suggestions about performance testing or api design would be nice. I haven't played with changing the sync logic yet, but I'm guessing that an api to allow manual sync instead of a sync per PQsendQuery will be needed. That could make things tricky though with multi-statement queries, because currently the only way to detect when results change from one query to the next are a ReadyForQuery message. Matt Newell /* * src/test/examples/testlibpqpipeline.c * * * testlibpqpipeline.c * this test program test query pipelining and it's performance impact * * */ #include stdio.h #include stdlib.h #include sys/time.h #include libpq-fe.h // If defined we won't issue more sql commands if the socket's // write buffer is full //#define MIN_LOCAL_Q //#define PRINT_QUERY_PROGRESS static int testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql ); static int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs ); int testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql ) { int nQueriesQueued; int nQueriesTotal; PGresult * result; PGquery * firstQuery; PGquery * curQuery; nQueriesQueued = nQueriesTotal = 0; result = NULL; firstQuery = curQuery = NULL; while( nQueriesQueued 0 || nQueriesTotal totalQueries ) { if( PQconsumeInput(conn) == 0 ) { printf( PQconsumeInput ERROR: %s\n, PQerrorMessage(conn) ); return 1; } do { curQuery = PQgetFirstQuery(conn); /* firstQuery is finished */ if( firstQuery != curQuery ) { //printf( %p done, curQuery=%p\n, firstQuery, curQuery ); #ifdef PRINT_QUERY_PROGRESS printf(-); #endif firstQuery = curQuery; nQueriesQueued--; } /* Break if no queries are ready */ if( !firstQuery || PQisBusy(conn) ) break; if( (result = PQgetResult(conn)) != 0 ) PQclear(result); } while(1); if( nQueriesTotal totalQueries nQueriesQueued totalQueued ) { #ifdef MIN_LOCAL_Q int flushResult = PQflush(conn); if( flushResult == -1 ) { printf( PQflush ERROR: %s\n, PQerrorMessage(conn) ); return 1; } else if ( flushResult == 1 ) continue; #endif PQsendQuery(conn,sql); if( firstQuery == NULL ) firstQuery = PQgetFirstQuery(conn); nQueriesTotal++; nQueriesQueued++; #ifdef PRINT_QUERY_PROGRESS printf( + ); #endif } } #ifdef PRINT_QUERY_PROGRESS printf( \n ); #endif return 0; } int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs ) { int result; struct timeval tv1, tv2; int secs, usecs; gettimeofday(tv1,NULL); #define TEST_P(q) \ if( (result = testPipelined(conn,totalQueries,totalQueued,q)) != 0 ) \ return result; TEST_P(INSERT INTO test() VALUES ()); TEST_P(SELECT * FROM test LIMIT 1); TEST_P(SELECT * FROM test); TEST_P(DELETE FROM test); gettimeofday(tv2,NULL); secs = tv2.tv_sec - tv1.tv_sec; usecs = secs * 100 + tv2.tv_usec - tv1.tv_usec; printf(testPipelinedSeries(%i,%i) took
Re: [HACKERS] [COMMITTERS] pgsql: Fix whitespace
On 12/3/14 7:44 AM, Alvaro Herrera wrote: Does this mean you're happy with it now? Does make check-world pass for you? Yes, it all works now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Tue, Dec 2, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote: Basically, the case in which I think it's helpful to issue a suggestion here is when the user has used the table name rather than the alias name. I wonder if it's worth checking for that case specifically, in lieu of what you've done here, and issuing a totally different hint in that case (HINT: You must refer to this as column as prime_minister.id rather than cameron.id). Well, if an alias is used, and you refer to an attribute using a non-alias name (i.e. the original table name), then you'll already get an error suggesting that the alias be used instead -- of course, that's nothing new. It doesn't matter to the existing hinting mechanism if the attribute name is otherwise wrong. Once you fix the code to use the alias suggested, you'll then get this new Levenshtein-based hint. Another idea, which I think I like less well, is to check the Levenshtein distance between the allowed alias and the entered alias and, if that's within the half-the-shorter-length threshold, consider possible matches from that RTE, charge the distance between the correct alias and the entered alias as a penalty to each potential column match. I don't about that either. Aliases are often totally arbitrary, particularly for ad-hoc queries, which is what this is aimed at. What I think won't do is to look at a situation where the user has entered automobile.id and suggest that maybe they meant student.iq, or even student.id. I'm not sure I follow. If there is an automobile.ip, then it will be suggested. If there is no automobile column that's much of a match (so no automobile.ip, say), then student.id will be suggested (and not student.iq, *even if there is no student.id* - the final quality check saves us). So this is possible: postgres=# select iq, * from student, automobile; ERROR: 42703: column iq does not exist LINE 1: select iq, * from student, automobile; ^ HINT: Perhaps you meant to reference the column student.id. postgres=# select automobile.iq, * from student, automobile; ERROR: 42703: column automobile.iq does not exist LINE 1: select automobile.iq, * from student, automobile; ^ (note that using the table name makes us *not* see a suggestion where we otherwise would). The point is that there is a fixed penalty for a wrong user-specified alias, but all relation RTEs are considered. The amount of difference between the names has got to matter for the RTE names, just as it does for the column names. I think it makes sense that it matters by a fixed amount. Besides, this seems complicated enough already - I don't won't to add more complexity to worry about equidistant (but still actually valid) RTE/table/alias names. It sounds like your concern here is mostly a concern about the relative distance among multiple matches, as opposed to the absolute quality of suggestions. The former seems a lot less controversial than the latter was, though - the user always gets the best match, or the join pair of best matches, or no match when this new hinting mechanism is involved. I attach a new revision. The revision: * Uses default costs for Levenshtein distance. * Still charges extra for a non-alias-matching match (although it only charges a fixed distance of 1 extra). This has regression test coverage. * Applies a generic final quality check that enforces a requirement that a hint have a distance of no greater than 50% of the total string size. No special treatment of shorter strings is involved anymore. * Moves almost everything out of scanRTEForColumn() as you outlined (into a new function, updateFuzzyAttrMatchState(), per your suggestion). * Moves dropped column detection into updateFuzzyAttrMatchState(), per your suggestion. * Still does the if (rte-rtekind == RTE_JOIN) thing in the existing function searchRangeTableForCol(). I am quite confident that a suggestion from a join RTE will never be useful, to either the existing use of searchRangeTableForCol() or this expanded use, and it makes more sense to me to put it there. In fact, the existing use of searchRangeTableForCol() is really rather similar to this, and will give up on the first identical match (which is taken as evidence that there is a attribute of that name, but isn't visible at this level of the query). So I have not followed your suggestion here. Thoughts? -- Peter Geoghegan From 81c7b0691e9d03c1bdd99f4b264737306d1bd2cf Mon Sep 17 00:00:00 2001 From: Peter Geoghegan p...@heroku.com Date: Wed, 12 Nov 2014 15:31:37 -0800 Subject: [PATCH] Levenshtein distance column HINT Add a new HINT -- a guess as to what column the user might have intended to reference, to be shown in various contexts where an ERRCODE_UNDEFINED_COLUMN error is raised. The user will see this HINT when he or she fat-fingers a column reference in an ad-hoc SQL query, or incorrectly pluralizes or fails to pluralize a column reference, or incorrectly omits or
Re: [HACKERS] tracking commit timestamps
On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Pushed with some extra cosmetic tweaks. I got the following assertion failure when I executed pg_xact_commit_timestamp() in the standby server. =# select pg_xact_commit_timestamp('1000'::xid); TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) == ((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c, Line: 315) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: 2014-12-04 12:01:08 JST sby1 LOG: server process (PID 15545) was terminated by signal 6: Aborted 2014-12-04 12:01:08 JST sby1 DETAIL: Failed process was running: select pg_xact_commit_timestamp('1000'::xid); The way to reproduce this problem is #1. set up and start the master and standby servers with track_commit_timestamp disabled #2. enable track_commit_timestamp in the master and restart the master #3. run some write transactions #4. enable track_commit_timestamp in the standby and restart the standby #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby BTW, at the step #4, I got the following log messages. This might be a hint for this problem. LOG: file pg_commit_ts/ doesn't exist, reading as zeroes CONTEXT: xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09; inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache 45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 16384 Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 4 December 2014 at 03:08, Fujii Masao masao.fu...@gmail.com wrote: #1. set up and start the master and standby servers with track_commit_timestamp disabled #2. enable track_commit_timestamp in the master and restart the master #3. run some write transactions #4. enable track_commit_timestamp in the standby and restart the standby #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby I'm not sure what step4 is supposed to do? Surely if steps 1-3 generate any WAL then the standby should replay it, whether or not track_commit_timestamp is enabled. So what effect does setting that parameter on the standby? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Thu, Dec 4, 2014 at 12:58 PM, Simon Riggs si...@2ndquadrant.com wrote: On 4 December 2014 at 03:08, Fujii Masao masao.fu...@gmail.com wrote: #1. set up and start the master and standby servers with track_commit_timestamp disabled #2. enable track_commit_timestamp in the master and restart the master #3. run some write transactions #4. enable track_commit_timestamp in the standby and restart the standby #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby I'm not sure what step4 is supposed to do? Surely if steps 1-3 generate any WAL then the standby should replay it, whether or not track_commit_timestamp is enabled. So what effect does setting that parameter on the standby? At least track_commit_timestamp seems to need to be enabled even in the standby when we want to call pg_xact_commit_timestamp() and pg_last_committed_xact() in the standby. I'm not sure if this is good design, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Thu, Dec 4, 2014 at 9:05 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/12/03 19:35), Ashutosh Bapat wrote: On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: This is not exactly extension of non-inheritance case. non-inheritance case doesn't show two remote SQLs under the same plan node. May be you can rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or something to that effect) for the DML command and the Foreign plan node should be renamed to Foreign access node or something to indicate that it does both the scan as well as DML. I am not keen about the actual terminology, but I think a reader of plan shouldn't get confused. We can leave this for committer's judgement. Thanks for the proposal! I think that would be a good idea. But I think there would be another idea. An example will be shown below. We show the update commands below the ModifyTable node, not above the corresponding ForeignScan nodes, so maybe less confusing. If there are no objections of you and others, I'll update the patch this way. postgres=# explain verbose update parent set a = a * 2 where a = 5; QUERY PLAN - Update on public.parent (cost=0.00..280.77 rows=25 width=10) On public.ft1 Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 On public.ft2 Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1 - Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=10) Output: (parent.a * 2), parent.ctid Filter: (parent.a = 5) - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (ft1.a * 2), ft1.ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE - Foreign Scan on public.ft2 (cost=100.00..140.38 rows=12 width=10) Output: (ft2.a * 2), ft2.ctid Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE (12 rows) Looks better. IIUC, even the transactions over the local and the *single* remote server are not guaranteed to be executed atomically in the current form. It is possible that the remote transaction succeeds and the local one fails, for example, resulting in data inconsistency between the local and the remote. IIUC, while committing transactions involving a single remote server, the steps taken are as follows 1. the local changes are brought to PRE-COMMIT stage, which means that the transaction *will* succeed locally after successful completion of this phase, 2. COMMIT message is sent to the foreign server 3. If step two succeeds, local changes are committed and successful commit is conveyed to the client 4. if step two fails, local changes are rolled back and abort status is conveyed to the client 5. If step 1 itself fails, the remote changes are rolled back. This is as per one phase commit protocol which guarantees ACID for single foreign data source. So, the changes involving local and a single foreign server seem to be atomic and consistent. Really? Maybe I'm missing something, but I don't think the current implementation for committing transactions has such a mechanism stated in step 1. So, I think it's possible that the local transaction fails in step3 while the remote transaction succeeds, as mentioned above. PFA a script attached which shows this. You may want to check the code in pgfdw_xact_callback() for actions taken by postgres_fdw on various events. CommitTransaction() for how those events are generated. The code there complies with the sequence above. Thanks, Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] inherit support for foreign tables
Sorry, here's the script. On Thu, Dec 4, 2014 at 10:00 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: On Thu, Dec 4, 2014 at 9:05 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/12/03 19:35), Ashutosh Bapat wrote: On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote: This is not exactly extension of non-inheritance case. non-inheritance case doesn't show two remote SQLs under the same plan node. May be you can rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or something to that effect) for the DML command and the Foreign plan node should be renamed to Foreign access node or something to indicate that it does both the scan as well as DML. I am not keen about the actual terminology, but I think a reader of plan shouldn't get confused. We can leave this for committer's judgement. Thanks for the proposal! I think that would be a good idea. But I think there would be another idea. An example will be shown below. We show the update commands below the ModifyTable node, not above the corresponding ForeignScan nodes, so maybe less confusing. If there are no objections of you and others, I'll update the patch this way. postgres=# explain verbose update parent set a = a * 2 where a = 5; QUERY PLAN - Update on public.parent (cost=0.00..280.77 rows=25 width=10) On public.ft1 Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 On public.ft2 Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1 - Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=10) Output: (parent.a * 2), parent.ctid Filter: (parent.a = 5) - Foreign Scan on public.ft1 (cost=100.00..140.38 rows=12 width=10) Output: (ft1.a * 2), ft1.ctid Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5)) FOR UPDATE - Foreign Scan on public.ft2 (cost=100.00..140.38 rows=12 width=10) Output: (ft2.a * 2), ft2.ctid Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5)) FOR UPDATE (12 rows) Looks better. IIUC, even the transactions over the local and the *single* remote server are not guaranteed to be executed atomically in the current form. It is possible that the remote transaction succeeds and the local one fails, for example, resulting in data inconsistency between the local and the remote. IIUC, while committing transactions involving a single remote server, the steps taken are as follows 1. the local changes are brought to PRE-COMMIT stage, which means that the transaction *will* succeed locally after successful completion of this phase, 2. COMMIT message is sent to the foreign server 3. If step two succeeds, local changes are committed and successful commit is conveyed to the client 4. if step two fails, local changes are rolled back and abort status is conveyed to the client 5. If step 1 itself fails, the remote changes are rolled back. This is as per one phase commit protocol which guarantees ACID for single foreign data source. So, the changes involving local and a single foreign server seem to be atomic and consistent. Really? Maybe I'm missing something, but I don't think the current implementation for committing transactions has such a mechanism stated in step 1. So, I think it's possible that the local transaction fails in step3 while the remote transaction succeeds, as mentioned above. PFA a script attached which shows this. You may want to check the code in pgfdw_xact_callback() for actions taken by postgres_fdw on various events. CommitTransaction() for how those events are generated. The code there complies with the sequence above. Thanks, Best regards, Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company tran_inconsistency.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Hi, From: Jim Nasby [mailto:jim.na...@bluetreble.com] On 12/2/14, 9:43 PM, Amit Langote wrote: What are you going to do if the partitioning key has two columns of different data types? Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. They are one of the most crucial elements of the scheme. I wonder if your suggestion of pg_node_tree plays well here. This then could be a list of CONSTs or some such... And I am thinking it's a concern only for range partitions, no? (that is, a multicolumn partition key) I think partkind switches the interpretation of the field as appropriate. Am I missing something? By the way, I had mentioned we could have two values fields each for range and list partition kind. The more SQL way would be records (composite types). That would make catalog inspection a LOT easier and presumably make it easier to change the partitioning key (I'm assuming ALTER TYPE cascades to stored data). Records are stored internally as tuples; not sure if that would be faster than a List of Consts or a pg_node_tree. Nodes would theoretically allow using things other than Consts, but I suspect that would be a bad idea. While I couldn’t find an example in system catalogs where a record/composite type is used, there are instances of pg_node_tree at a number of places like in pg_attrdef and others. Could you please point me to such a usage for reference? Something else to consider... our user-space support for ranges is now rangetypes, so perhaps that's what we should use for range partitioning. The up-side (which would be a double-edged sword) is that you could leave holes in your partitioning map. Note that in the multi-key case we could still have a record of rangetypes. That is something I had mind at least at some point. My general doubt remains about the usage of user space SQL types for catalog fields though I may be completely uninitiated about such usage. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Seq Scan
As per discussion on another thread related to using custom scan nodes for prototype of parallel sequence scan, I have developed the same, but directly by adding new nodes for parallel sequence scan. There might be some advantages for developing this as a contrib module by using custom scan nodes, however I think we might get stucked after some point due to custom scan node capability as pointed out by Andres. The basic idea used is that while evaluating the cheapest path for scan, optimizer will also evaluate if it can use parallel seq path. Currently I have kept a very simple model to calculate the cost of parallel sequence path which is that divide the cost for CPU and disk by availble number of worker backends (We can enhance it based on further experiments and discussion; we need to consider worker startup and dynamic shared memory setup cost as well). The work aka scan of blocks is divided equally among all workers (except for corner cases where blocks can't be equally divided among workers, the last worker will be responsible for scanning the remaining blocks). The number of worker backends that can be used for parallel seq scan can be configured by using a new GUC parallel_seqscan_degree, the default value of which is zero and it means parallel seq scan will not be considered unless user configures this value. In ExecutorStart phase, initiate the required number of workers as per parallel seq scan plan and setup dynamic shared memory and share the information required for worker to execute the scan. Currently I have just shared the relId, targetlist and number of blocks to be scanned by worker, however I think we might want to generate a plan for each of the workers in master backend and then share the same to individual worker. Now to fetch the data from multiple queues corresponding to each worker a simple mechanism is used that is fetch from first queue till all the data is consumed from same, then fetch from second queue and so on. Also here master backend is responsible for just getting the data from workers and passing it back to client. I am sure that we can improve this strategy in many ways like by making master backend to also perform scan for some of the blocks rather than just getting data from workers and a better strategy to fetch the data from multiple queues. Worker backend will receive the information related to scan from master backend and generate the plan from same and execute that plan, so here the work to scan the data after generating the plan is very much similar to exec_simple_query() (i.e Create the portal and run it based on planned statement) except that worker backends will initialize the block range it want to scan in executor initialization phase (ExecInitSeqScan()). Workers will exit after sending the data to master backend which essentially means that for each execution we need to initiate the workers, I think here we can improve by giving the control for workers to postmaster so that we don't need to initialize them each time during execution, however this can be a totally separate optimization which is better to be done independently of this patch. As currently we don't have mechanism to share transaction state, I have used separate transaction in worker backend to execute the plan. Any error in master backend either via backend worker or due to other issue in master backend itself should terminate all the workers before aborting the transaction. We can't do it with the error context callback mechanism (error_context_stack) which we use at other places in code, as for this case we need it from the time workers are started till the execution is complete (error_context_stack could get reset once the control goes out of the function which has set it.) One way could be that maintain the callback information in TransactionState and use it to kill the workers before aborting transaction in main backend. Another could be that have another variable similar to error_context_stack (which will be used specifically for storing the workers state), and kill the workers in errfinish via callback. Currently I have handled it at the time of detaching from shared memory. Another point that needs to be taken care in worker backend is that if any error occurs, we should *not* abort the transaction as the transaction state is shared across all workers. Currently the parallel seq scan will not be considered for statements other than SELECT or if there is a join in the statement or if statement contains quals or if target list contains non-Var fields. We can definitely support simple quals and targetlist other than non-Vars. By simple, I means that it should not contain functions or some other conditions which can't be pushed down to worker backend. Behaviour of some simple statements with patch is as below: postgres=# create table t1(c1 int, c2 char(500)) with (fillfactor=10); CREATE TABLE postgres=# insert into t1 values(generate_series(1,100),'amit'); INSERT 0 100