Re: [PERFORM] Partitions not Working as Expected
Shaun Thomas writes: > On 06/27/2013 01:42 PM, Tom Lane wrote: >> That will break things: CURRENT_DATE will then be equivalent to just >> writing today's date as a literal. > Interesting. I tested it by creating a view and a table with a default, > and it always seems to get translated to: > ('now'::text)::date Yeah, that is what the parser does with it. The way to read that is "a constant of type text, containing the string 'now', to which is applied a run-time coercion to type date". The run-time coercion is equivalent to (and implemented by) calling text_out then date_in. If date_in is marked immutable, then the planner will correctly conclude that it can fold the whole thing to a date constant on sight. Now you have a plan with a hard-wired value for the current date, which will begin to give wrong answers after midnight passes. If your usage pattern is such that no query plan survives across a day boundary, you might not notice ... but it's still wrong. > ... What's interesting is that EnterpriseDB has > their own pg_catalog.current_date function that gets called by the > CURRENT_DATE keyword. Yeah, we really ought to do likewise in the community code. But that doesn't affect the fundamental semantic issue here, which is that you can't mark the expression immutable without creating incorrect cached plans. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
Andres Freund writes: > Couldn't we at least significantly improve on the status quo by > detecting we're currently planning a query that's only going to be > executed once (because it's directly executed or because were planning a > onetime plan for specific parameters) and inline stable functions before > doing the theorem proving? I think Haas went down that rabbit hole before you. The current definition of stable functions is not strong enough to guarantee that a plan-time evaluation would give the same result as a run-time evaluation, not even in one-shot-plan cases. The obvious reason why not is that the planner isn't using the same snapshot that the executor will use (which is not that easy to change, see his failed patch from a year or so back). But even if we rejiggered things enough so the query did use the same snapshot that'd been used for planning, I'm not very convinced that such an assumption would be valid. The assumptions for stable functions are pretty weak really. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
> At this point I wonder why CURRENT_DATE even exists, if using it is > apparently detrimental to query execution. It's good for inserts. ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 2013-06-27 14:42:26 -0400, Tom Lane wrote: > Shaun Thomas writes: > > On 06/27/2013 12:42 PM, Dave Johansen wrote: > >> Or what about something like DATE_TRUNC("DAY", now())? Or would that run > >> into the same optimization/planner problems as CURRENT_DATE? > > > Same issue. This seems to work, though I'm not entirely sure of the > > implications: > > > UPDATE pg_proc > > SET provolatile = 'i' > > WHERE proname = 'date_in'; > > That will break things: CURRENT_DATE will then be equivalent to just > writing today's date as a literal. > > It's conceivable that it wouldn't break any scenario that you personally > care about, if you never use CURRENT_DATE in any view, rule, column > default expression, or cached plan; but it seems mighty risky from here. > I don't see any very good solution to your problem within the current > approach to partitioning, which is basically theorem-proving. That > proof engine has no concept of time passing, let alone the sort of > detailed knowledge of the semantics of this particular function that > would allow it to conclude "if CURRENT_DATE > '2013-06-20' is true now, > it will always be so in the future as well". Couldn't we at least significantly improve on the status quo by detecting we're currently planning a query that's only going to be executed once (because it's directly executed or because were planning a onetime plan for specific parameters) and inline stable functions before doing the theorem proving? Maybe I am missing something here? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 02:49 PM, Andrew Dunstan wrote: But that's a lie, surely. If it breaks you have nobody to blame but yourself. There's a reason EDB haven't marked their function immutable - it's not. Well, yeah. That's why I'm testing it in a dev system. :) None of this will probably pan out, but I need to see the limits of how badly I can abuse the database. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 03:14 PM, Shaun Thomas wrote: On 06/27/2013 01:42 PM, Tom Lane wrote: That will break things: CURRENT_DATE will then be equivalent to just writing today's date as a literal. Interesting. I tested it by creating a view and a table with a default, and it always seems to get translated to: ('now'::text)::date But I'll take your explanation at face value, since that doesn't imply what the output would be. What's interesting is that EnterpriseDB has their own pg_catalog.current_date function that gets called by the CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the current_date function as immutable without affecting a lot of other internals. On EDB, this actually works: UPDATE pg_proc SET provolatile = 'i' WHERE proname = 'current_date'; But that's a lie, surely. If it breaks you have nobody to blame but yourself. There's a reason EDB haven't marked their function immutable - it's not. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 01:45 PM, Albin, Lloyd P wrote: We have also run into this with our production databases. We worked around the issue by adding an index to each child table so that it scans all the child index's instead of the child table's. For us this made a large performance improvement. Haha. Yeah, that's assumed. I'd never use a partition set without the constraint column in at least one index. The proof of concept was just to illustrate that the planner doesn't even get that far in ignoring "empty" partitions. Sure, scanning the inapplicable child tables has a low cost, but it's not zero. With about a dozen of them, query times increase from 0.130ms to 0.280ms for my test case. Not a lot in the long run, but in a OLTP system, it can be fairly noticeable. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 01:42 PM, Tom Lane wrote: That will break things: CURRENT_DATE will then be equivalent to just writing today's date as a literal. Interesting. I tested it by creating a view and a table with a default, and it always seems to get translated to: ('now'::text)::date But I'll take your explanation at face value, since that doesn't imply what the output would be. What's interesting is that EnterpriseDB has their own pg_catalog.current_date function that gets called by the CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the current_date function as immutable without affecting a lot of other internals. On EDB, this actually works: UPDATE pg_proc SET provolatile = 'i' WHERE proname = 'current_date'; Then the plan gets pared down as desired. But again, if the date were to roll over, I'm not sure what would happen. I wish I could test that without fiddling with machine times. I don't see any very good solution to your problem within the current approach to partitioning, which is basically theorem-proving. That proof engine has no concept of time passing, let alone the sort of detailed knowledge of the semantics of this particular function that would allow it to conclude "if CURRENT_DATE > '2013-06-20' is true now, it will always be so in the future as well". I get it. From the context of two months ago, CURRENT_DATE > '2013-06-20' would return a different answer than it would today, which isn't really good for proofs. The only way for it to work as "expected" would be to add a first pass to resolve any immediate variables, which would effectively throw away plan caches. I'd actually be OK with that. I think most hackers agree that the way forward on partitioning involves building hard-wired logic that selects the correct partition(s) at run-time, so that it wouldn't particularly matter where we got the comparison value from or whether it was a constant. Fair enough. I'll stop telling devs to use current_date instead of ORM injections, then. Hopefully we can track down and tweak the affected queries on the tables we're partitioning without too much work and QA. Thanks, Tom! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
We have also run into this with our production databases. We worked around the issue by adding an index to each child table so that it scans all the child index's instead of the child table's. For us this made a large performance improvement. CREATE INDEX part_test_1_idx ON part_test_1 USING btree (part_col); CREATE INDEX part_test_2_idx ON part_test_2 USING btree (part_col); Lloyd Albin Statistical Center for HIV/AIDS Research and Prevention (SCHARP) Vaccine and Infectious Disease Division (VIDD) Fred Hutchinson Cancer Research Center (FHCRC) -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Shaun Thomas Sent: Thursday, June 27, 2013 11:16 AM To: Dave Johansen Cc: bricklen; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitions not Working as Expected On 06/27/2013 12:42 PM, Dave Johansen wrote: > Or what about something like DATE_TRUNC("DAY", now())? Or would that > run into the same optimization/planner problems as CURRENT_DATE? Same issue. This seems to work, though I'm not entirely sure of the implications: UPDATE pg_proc SET provolatile = 'i' WHERE proname = 'date_in'; -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
Shaun Thomas writes: > On 06/27/2013 12:42 PM, Dave Johansen wrote: >> Or what about something like DATE_TRUNC("DAY", now())? Or would that run >> into the same optimization/planner problems as CURRENT_DATE? > Same issue. This seems to work, though I'm not entirely sure of the > implications: > UPDATE pg_proc > SET provolatile = 'i' > WHERE proname = 'date_in'; That will break things: CURRENT_DATE will then be equivalent to just writing today's date as a literal. It's conceivable that it wouldn't break any scenario that you personally care about, if you never use CURRENT_DATE in any view, rule, column default expression, or cached plan; but it seems mighty risky from here. I don't see any very good solution to your problem within the current approach to partitioning, which is basically theorem-proving. That proof engine has no concept of time passing, let alone the sort of detailed knowledge of the semantics of this particular function that would allow it to conclude "if CURRENT_DATE > '2013-06-20' is true now, it will always be so in the future as well". I think most hackers agree that the way forward on partitioning involves building hard-wired logic that selects the correct partition(s) at run-time, so that it wouldn't particularly matter where we got the comparison value from or whether it was a constant. So I'm not feeling motivated to try to hack some solution for this case into the theorem prover. Unfortunately, it's likely to be awhile before that next-generation partitioning code shows up. But major extensions to the proof engine wouldn't be a weekend project, either... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 12:42 PM, Dave Johansen wrote: Or what about something like DATE_TRUNC("DAY", now())? Or would that run into the same optimization/planner problems as CURRENT_DATE? Same issue. This seems to work, though I'm not entirely sure of the implications: UPDATE pg_proc SET provolatile = 'i' WHERE proname = 'date_in'; -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On Thu, Jun 27, 2013 at 10:34 AM, bricklen wrote: > On Thu, Jun 27, 2013 at 10:17 AM, Shaun Thomas > wrote: > >> >> Well yeah. That's not really the point, though. Aside from existing code, >> hard-coding is generally frowned upon. Our devs have been using >> CURRENT_DATE and its ilk for over six years now. >> > > Would it help to put the current_date call in a wrapper function and > coerce it as IMMUTABLE? A quick test shows that constraint exclusion seems > to kick in, but I can't speak intelligently about whether that is wise or > not. > > Or what about something like DATE_TRUNC("DAY", now())? Or would that run into the same optimization/planner problems as CURRENT_DATE?
Re: [PERFORM] Partitions not Working as Expected
On Thu, Jun 27, 2013 at 10:17 AM, Shaun Thomas wrote: > > Well yeah. That's not really the point, though. Aside from existing code, > hard-coding is generally frowned upon. Our devs have been using > CURRENT_DATE and its ilk for over six years now. > Would it help to put the current_date call in a wrapper function and coerce it as IMMUTABLE? A quick test shows that constraint exclusion seems to kick in, but I can't speak intelligently about whether that is wise or not.
Re: [PERFORM] Partitions not Working as Expected
On 06/27/2013 12:08 PM, Igor Neyman wrote: Doesn't have to be hardcoded. If executed as dynamic sql, it will be re-planned properly, e.g.: Well yeah. That's not really the point, though. Aside from existing code, hard-coding is generally frowned upon. Our devs have been using CURRENT_DATE and its ilk for over six years now. So now I get to tell our devs to refactor six years of JAVA code and find any place they use CURRENT_DATE, and replace it with an ORM variable for the current date instead. At this point I wonder why CURRENT_DATE even exists, if using it is apparently detrimental to query execution. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- > performance-ow...@postgresql.org] On Behalf Of Shaun Thomas > Sent: Thursday, June 27, 2013 12:16 PM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Partitions not Working as Expected > > Hey guys, > > I suspect I'll get an answer equivalent to "the planner treats that like a > variable," but I really hope not because it renders partitions essentially > useless to us. This is as recent as 9.1.9 and constraint exclusion is enabled. > > What I have is this test case: > > CREATE TABLE part_test ( >fake INT, >part_col TIMESTAMP WITHOUT TIME ZONE > ); > > CREATE TABLE part_test_1 ( >CHECK (part_col >= '2013-05-01' AND > part_col < '2013-06-01') > ) INHERITS (part_test); > > CREATE TABLE part_test_2 ( >CHECK (part_col >= '2013-04-01' AND > part_col < '2013-05-01') > ) INHERITS (part_test); > > And this query performs a sequence scan across all partitions: > > EXPLAIN ANALYZE > SELECT * FROM part_test > WHERE part_col > CURRENT_DATE; > > The CURRENT_DATE value is clearly more recent than any of the partitions, > yet it checks them anyway. The only way to get it to properly constrain > partitions is to use a static value: > > EXPLAIN ANALYZE > SELECT * FROM part_test > WHERE part_col > '2013-06-27'; > > But developers never do this. Nor should they. I feel like an idiot even > asking > this, because it seems so wrong, and I can't seem to come up with a > workaround other than, "Ok devs, hard code dates into all of your queries > from now on." > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-676-8870 > stho...@optionshouse.com > Doesn't have to be hardcoded. If executed as dynamic sql, it will be re-planned properly, e.g.: lQueryString := 'SELECT MAX(cycle_date_time) AS MaxDT FROM gp_cycle_' || partition_extension::varchar || ' WHERE cell_id = ' || i_n_Cell_id::varchar || ' AND part_type_id = ' || i_n_PartType_id::varchar || ' AND cycle_date_time <= TIMESTAMP ' || quote_literal(cast(i_t_EndDate AS VARCHAR)); IF (lQueryString IS NOT NULL) THEN EXECUTE lQueryString INTO lEndDate; Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partitions not Working as Expected
Hey guys, I suspect I'll get an answer equivalent to "the planner treats that like a variable," but I really hope not because it renders partitions essentially useless to us. This is as recent as 9.1.9 and constraint exclusion is enabled. What I have is this test case: CREATE TABLE part_test ( fake INT, part_col TIMESTAMP WITHOUT TIME ZONE ); CREATE TABLE part_test_1 ( CHECK (part_col >= '2013-05-01' AND part_col < '2013-06-01') ) INHERITS (part_test); CREATE TABLE part_test_2 ( CHECK (part_col >= '2013-04-01' AND part_col < '2013-05-01') ) INHERITS (part_test); And this query performs a sequence scan across all partitions: EXPLAIN ANALYZE SELECT * FROM part_test WHERE part_col > CURRENT_DATE; The CURRENT_DATE value is clearly more recent than any of the partitions, yet it checks them anyway. The only way to get it to properly constrain partitions is to use a static value: EXPLAIN ANALYZE SELECT * FROM part_test WHERE part_col > '2013-06-27'; But developers never do this. Nor should they. I feel like an idiot even asking this, because it seems so wrong, and I can't seem to come up with a workaround other than, "Ok devs, hard code dates into all of your queries from now on." -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance