[PERFORM] Partitions not Working as Expected

2013-06-27 Thread Shaun Thomas

Hey guys,

I suspect I'll get an answer equivalent to the planner treats that like 
a variable, but I really hope not because it renders partitions 
essentially useless to us. This is as recent as 9.1.9 and constraint 
exclusion is enabled.


What I have is this test case:

CREATE TABLE part_test (
  fake INT,
  part_col TIMESTAMP WITHOUT TIME ZONE
);

CREATE TABLE part_test_1 (
  CHECK (part_col = '2013-05-01' AND
 part_col  '2013-06-01')
) INHERITS (part_test);

CREATE TABLE part_test_2 (
  CHECK (part_col = '2013-04-01' AND
 part_col  '2013-05-01')
) INHERITS (part_test);

And this query performs a sequence scan across all partitions:

EXPLAIN ANALYZE
SELECT * FROM part_test
 WHERE part_col  CURRENT_DATE;

The CURRENT_DATE value is clearly more recent than any of the 
partitions, yet it checks them anyway. The only way to get it to 
properly constrain partitions is to use a static value:


EXPLAIN ANALYZE
SELECT * FROM part_test
 WHERE part_col  '2013-06-27';

But developers never do this. Nor should they. I feel like an idiot even 
asking this, because it seems so wrong, and I can't seem to come up with 
a workaround other than, Ok devs, hard code dates into all of your 
queries from now on.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Shaun Thomas
 Sent: Thursday, June 27, 2013 12:16 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Partitions not Working as Expected
 
 Hey guys,
 
 I suspect I'll get an answer equivalent to the planner treats that like a
 variable, but I really hope not because it renders partitions essentially
 useless to us. This is as recent as 9.1.9 and constraint exclusion is enabled.
 
 What I have is this test case:
 
 CREATE TABLE part_test (
fake INT,
part_col TIMESTAMP WITHOUT TIME ZONE
 );
 
 CREATE TABLE part_test_1 (
CHECK (part_col = '2013-05-01' AND
   part_col  '2013-06-01')
 ) INHERITS (part_test);
 
 CREATE TABLE part_test_2 (
CHECK (part_col = '2013-04-01' AND
   part_col  '2013-05-01')
 ) INHERITS (part_test);
 
 And this query performs a sequence scan across all partitions:
 
 EXPLAIN ANALYZE
 SELECT * FROM part_test
   WHERE part_col  CURRENT_DATE;
 
 The CURRENT_DATE value is clearly more recent than any of the partitions,
 yet it checks them anyway. The only way to get it to properly constrain
 partitions is to use a static value:
 
 EXPLAIN ANALYZE
 SELECT * FROM part_test
   WHERE part_col  '2013-06-27';
 
 But developers never do this. Nor should they. I feel like an idiot even 
 asking
 this, because it seems so wrong, and I can't seem to come up with a
 workaround other than, Ok devs, hard code dates into all of your queries
 from now on.
 
 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
 312-676-8870
 stho...@optionshouse.com
 

Doesn't have to be hardcoded.
If executed as dynamic sql, it will be re-planned properly, e.g.:

lQueryString := 'SELECT MAX(cycle_date_time) AS MaxDT
 FROM gp_cycle_' || partition_extension::varchar ||
 ' WHERE cell_id = ' || i_n_Cell_id::varchar ||
 ' AND part_type_id = ' || i_n_PartType_id::varchar ||
 ' AND cycle_date_time = TIMESTAMP ' || 
quote_literal(cast(i_t_EndDate AS VARCHAR));
   IF (lQueryString IS NOT NULL) THEN
   EXECUTE lQueryString INTO lEndDate;


Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Shaun Thomas

On 06/27/2013 12:08 PM, Igor Neyman wrote:


Doesn't have to be hardcoded.
If executed as dynamic sql, it will be re-planned properly, e.g.:


Well yeah. That's not really the point, though. Aside from existing 
code, hard-coding is generally frowned upon. Our devs have been using 
CURRENT_DATE and its ilk for over six years now.


So now I get to tell our devs to refactor six years of JAVA code and 
find any place they use CURRENT_DATE, and replace it with an ORM 
variable for the current date instead.


At this point I wonder why CURRENT_DATE even exists, if using it is 
apparently detrimental to query execution.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread bricklen
On Thu, Jun 27, 2013 at 10:17 AM, Shaun Thomas stho...@optionshouse.comwrote:


 Well yeah. That's not really the point, though. Aside from existing code,
 hard-coding is generally frowned upon. Our devs have been using
 CURRENT_DATE and its ilk for over six years now.


Would it help to put the current_date call in a wrapper function and coerce
it as IMMUTABLE? A quick test shows that constraint exclusion seems to kick
in, but I can't speak intelligently about whether that is wise or not.


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Dave Johansen
On Thu, Jun 27, 2013 at 10:34 AM, bricklen brick...@gmail.com wrote:

 On Thu, Jun 27, 2013 at 10:17 AM, Shaun Thomas 
 stho...@optionshouse.comwrote:


 Well yeah. That's not really the point, though. Aside from existing code,
 hard-coding is generally frowned upon. Our devs have been using
 CURRENT_DATE and its ilk for over six years now.


 Would it help to put the current_date call in a wrapper function and
 coerce it as IMMUTABLE? A quick test shows that constraint exclusion seems
 to kick in, but I can't speak intelligently about whether that is wise or
 not.


Or what about something like DATE_TRUNC(DAY, now())? Or would that run
into the same optimization/planner problems as CURRENT_DATE?


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Shaun Thomas

On 06/27/2013 12:42 PM, Dave Johansen wrote:


Or what about something like DATE_TRUNC(DAY, now())? Or would that run
into the same optimization/planner problems as CURRENT_DATE?


Same issue. This seems to work, though I'm not entirely sure of the 
implications:


UPDATE pg_proc
   SET provolatile = 'i'
 WHERE proname = 'date_in';

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Tom Lane
Shaun Thomas stho...@optionshouse.com writes:
 On 06/27/2013 12:42 PM, Dave Johansen wrote:
 Or what about something like DATE_TRUNC(DAY, now())? Or would that run
 into the same optimization/planner problems as CURRENT_DATE?

 Same issue. This seems to work, though I'm not entirely sure of the 
 implications:

 UPDATE pg_proc
 SET provolatile = 'i'
   WHERE proname = 'date_in';

That will break things: CURRENT_DATE will then be equivalent to just
writing today's date as a literal.

It's conceivable that it wouldn't break any scenario that you personally
care about, if you never use CURRENT_DATE in any view, rule, column
default expression, or cached plan; but it seems mighty risky from here.


I don't see any very good solution to your problem within the current
approach to partitioning, which is basically theorem-proving.  That
proof engine has no concept of time passing, let alone the sort of
detailed knowledge of the semantics of this particular function that
would allow it to conclude if CURRENT_DATE  '2013-06-20' is true now,
it will always be so in the future as well.

I think most hackers agree that the way forward on partitioning involves
building hard-wired logic that selects the correct partition(s) at
run-time, so that it wouldn't particularly matter where we got the
comparison value from or whether it was a constant.  So I'm not feeling
motivated to try to hack some solution for this case into the theorem
prover.

Unfortunately, it's likely to be awhile before that next-generation
partitioning code shows up.  But major extensions to the proof engine
wouldn't be a weekend project, either...

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Albin, Lloyd P
We have also run into this with our production databases. We worked around the 
issue by adding an index to each child table so that it scans all the child 
index's instead of the child table's. For us this made a large performance 
improvement.

CREATE INDEX part_test_1_idx ON part_test_1
  USING btree (part_col);

CREATE INDEX part_test_2_idx ON part_test_2
  USING btree (part_col);

Lloyd Albin
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Vaccine and Infectious Disease Division (VIDD)
Fred Hutchinson Cancer Research Center (FHCRC)

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Shaun Thomas
Sent: Thursday, June 27, 2013 11:16 AM
To: Dave Johansen
Cc: bricklen; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitions not Working as Expected

On 06/27/2013 12:42 PM, Dave Johansen wrote:

 Or what about something like DATE_TRUNC(DAY, now())? Or would that 
 run into the same optimization/planner problems as CURRENT_DATE?

Same issue. This seems to work, though I'm not entirely sure of the
implications:

UPDATE pg_proc
SET provolatile = 'i'
  WHERE proname = 'date_in';

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Shaun Thomas

On 06/27/2013 01:42 PM, Tom Lane wrote:


That will break things: CURRENT_DATE will then be equivalent to just
writing today's date as a literal.


Interesting. I tested it by creating a view and a table with a default, 
and it always seems to get translated to:


('now'::text)::date

But I'll take your explanation at face value, since that doesn't imply 
what the output would be. What's interesting is that EnterpriseDB has 
their own pg_catalog.current_date function that gets called by the 
CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the 
current_date function as immutable without affecting a lot of other 
internals.


On EDB, this actually works:

UPDATE pg_proc
   SET provolatile = 'i'
 WHERE proname = 'current_date';

Then the plan gets pared down as desired. But again, if the date were to 
roll over, I'm not sure what would happen. I wish I could test that 
without fiddling with machine times.



I don't see any very good solution to your problem within the current
approach to partitioning, which is basically theorem-proving.  That
proof engine has no concept of time passing, let alone the sort of
detailed knowledge of the semantics of this particular function that
would allow it to conclude if CURRENT_DATE  '2013-06-20' is true now,
it will always be so in the future as well.


I get it. From the context of two months ago, CURRENT_DATE  
'2013-06-20' would return a different answer than it would today, which 
isn't really good for proofs.


The only way for it to work as expected would be to add a first pass 
to resolve any immediate variables, which would effectively throw away 
plan caches. I'd actually be OK with that.



I think most hackers agree that the way forward on partitioning
involves building hard-wired logic that selects the correct
partition(s) at run-time, so that it wouldn't particularly matter
where we got the comparison value from or whether it was a constant.


Fair enough. I'll stop telling devs to use current_date instead of ORM 
injections, then. Hopefully we can track down and tweak the affected 
queries on the tables we're partitioning without too much work and QA.


Thanks, Tom!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Shaun Thomas

On 06/27/2013 01:45 PM, Albin, Lloyd P wrote:


We have also run into this with our production databases. We worked
around the issue by adding an index to each child table so that it
scans all the child index's instead of the child table's. For us
this made a large performance improvement.


Haha. Yeah, that's assumed. I'd never use a partition set without the 
constraint column in at least one index. The proof of concept was just 
to illustrate that the planner doesn't even get that far in ignoring 
empty partitions. Sure, scanning the inapplicable child tables has a 
low cost, but it's not zero. With about a dozen of them, query times 
increase from 0.130ms to 0.280ms for my test case. Not a lot in the long 
run, but in a OLTP system, it can be fairly noticeable.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Andrew Dunstan


On 06/27/2013 03:14 PM, Shaun Thomas wrote:

On 06/27/2013 01:42 PM, Tom Lane wrote:


That will break things: CURRENT_DATE will then be equivalent to just
writing today's date as a literal.


Interesting. I tested it by creating a view and a table with a 
default, and it always seems to get translated to:


('now'::text)::date

But I'll take your explanation at face value, since that doesn't imply 
what the output would be. What's interesting is that EnterpriseDB has 
their own pg_catalog.current_date function that gets called by the 
CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the 
current_date function as immutable without affecting a lot of other 
internals.


On EDB, this actually works:

UPDATE pg_proc
   SET provolatile = 'i'
 WHERE proname = 'current_date';



But that's a lie, surely. If it breaks you have nobody to blame but 
yourself. There's a reason EDB haven't marked their function immutable - 
it's not.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Shaun Thomas

On 06/27/2013 02:49 PM, Andrew Dunstan wrote:


But that's a lie, surely. If it breaks you have nobody to blame but
yourself. There's a reason EDB haven't marked their function
immutable - it's not.


Well, yeah. That's why I'm testing it in a dev system. :)

None of this will probably pan out, but I need to see the limits of how 
badly I can abuse the database.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Andres Freund
On 2013-06-27 14:42:26 -0400, Tom Lane wrote:
 Shaun Thomas stho...@optionshouse.com writes:
  On 06/27/2013 12:42 PM, Dave Johansen wrote:
  Or what about something like DATE_TRUNC(DAY, now())? Or would that run
  into the same optimization/planner problems as CURRENT_DATE?
 
  Same issue. This seems to work, though I'm not entirely sure of the 
  implications:
 
  UPDATE pg_proc
  SET provolatile = 'i'
WHERE proname = 'date_in';
 
 That will break things: CURRENT_DATE will then be equivalent to just
 writing today's date as a literal.
 
 It's conceivable that it wouldn't break any scenario that you personally
 care about, if you never use CURRENT_DATE in any view, rule, column
 default expression, or cached plan; but it seems mighty risky from here.

 I don't see any very good solution to your problem within the current
 approach to partitioning, which is basically theorem-proving.  That
 proof engine has no concept of time passing, let alone the sort of
 detailed knowledge of the semantics of this particular function that
 would allow it to conclude if CURRENT_DATE  '2013-06-20' is true now,
 it will always be so in the future as well.

Couldn't we at least significantly improve on the status quo by
detecting we're currently planning a query that's only going to be
executed once (because it's directly executed or because were planning a
onetime plan for specific parameters) and inline stable functions before
doing the theorem proving?

Maybe I am missing something here?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Josh Berkus

 At this point I wonder why CURRENT_DATE even exists, if using it is
 apparently detrimental to query execution.

It's good for inserts.  ;-)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Couldn't we at least significantly improve on the status quo by
 detecting we're currently planning a query that's only going to be
 executed once (because it's directly executed or because were planning a
 onetime plan for specific parameters) and inline stable functions before
 doing the theorem proving?

I think Haas went down that rabbit hole before you.  The current
definition of stable functions is not strong enough to guarantee that a
plan-time evaluation would give the same result as a run-time
evaluation, not even in one-shot-plan cases.  The obvious reason why not
is that the planner isn't using the same snapshot that the executor will
use (which is not that easy to change, see his failed patch from a year
or so back).  But even if we rejiggered things enough so the query did
use the same snapshot that'd been used for planning, I'm not very
convinced that such an assumption would be valid.  The assumptions for
stable functions are pretty weak really.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Tom Lane
Shaun Thomas stho...@optionshouse.com writes:
 On 06/27/2013 01:42 PM, Tom Lane wrote:
 That will break things: CURRENT_DATE will then be equivalent to just
 writing today's date as a literal.

 Interesting. I tested it by creating a view and a table with a default, 
 and it always seems to get translated to:
 ('now'::text)::date

Yeah, that is what the parser does with it.  The way to read that is
a constant of type text, containing the string 'now', to which is
applied a run-time coercion to type date.  The run-time coercion is
equivalent to (and implemented by) calling text_out then date_in.
If date_in is marked immutable, then the planner will correctly conclude
that it can fold the whole thing to a date constant on sight.  Now you
have a plan with a hard-wired value for the current date, which will
begin to give wrong answers after midnight passes.  If your usage
pattern is such that no query plan survives across a day boundary,
you might not notice ... but it's still wrong.

 ... What's interesting is that EnterpriseDB has 
 their own pg_catalog.current_date function that gets called by the 
 CURRENT_DATE keyword.

Yeah, we really ought to do likewise in the community code.  But that
doesn't affect the fundamental semantic issue here, which is that you
can't mark the expression immutable without creating incorrect cached
plans.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance