Re: [PATCHES] Partitioning docs

Simon Riggs Mon, 31 Oct 2005 14:49:06 -0800

On Mon, 2005-10-31 at 02:46 +0000, Simon Riggs wrote:
> I've been working on some docs for Constraining Exclusion & Partitioning
> for some time now. Deadlines seem to be looming, or may even have
> passed, so it seems sensible to submit what I have now.


> Many thanks to Josh Berkus for providing the numbered section on
> implementation process, which was the starting point I'd been looking
> for to describe everything else.

I believe this is now complete and ready for application.

- passes sgml make against cvstip
- spellchecked
- all code executed correctly against RC1

Comments please? Apart from the obvious, so why did it take you so long.
Apologies to the translators.

Best Regards, Simon Riggs

Index: ddl.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ddl.sgml,v
retrieving revision 1.45
diff -c -c -r1.45 ddl.sgml
*** ddl.sgml	23 Oct 2005 19:29:49 -0000	1.45
--- ddl.sgml	31 Oct 2005 22:36:26 -0000
***************
*** 398,403 ****
--- 398,410 ----
      ensure that a column does not contain null values, the not-null
      constraint described in the next section can be used.
     </para>
+ 
+     <para>
+      Check constraints can also be used to enhance performance with
+      very large tables, when used in conjunction with the 
+      <xref linkend="guc-constraint-exclusion"> parameter. 
+      This is discussed in more detail in <xref linkend="ce-partitioning">
+    </para>
    </sect2>
  
    <sect2>
***************
*** 1040,1052 ****
   <sect1 id="ddl-inherit">
    <title>Inheritance</title>
  
!   <remark>This section needs to be rethought.  Some of the
!   information should go into the following chapters.</remark>
  
    <para>
!    Let's create two tables. The capitals  table  contains
!    state  capitals  which  are also cities. Naturally, the
!    capitals table should inherit from cities.
  
  <programlisting>
  CREATE TABLE cities (
--- 1047,1081 ----
   <sect1 id="ddl-inherit">
    <title>Inheritance</title>
  
!    <indexterm>
!     <primary>not-null constraint</primary>
!    </indexterm>
! 
!    <indexterm>
!     <primary>constraint</primary>
!     <secondary>NOT NULL</secondary>
!    </indexterm>
! 
!   <para>
!    <productname>PostgreSQL</productname> was the first DBMS to introduce
!    inheritance, one of its many object-relational features.
!    <productname>PostgreSQL</productname> implements table
!    inheritance which can be a useful tool for database designers. 
!    The SQL:2003 standard optionally defines type inheritance which differs
!    in many respects from the features described here.
!   </para>
  
    <para>
!    Let's start with an example: 
!    We're trying to build a data model for cities, but we have a problem.
!    Each state has many cities, but only one capital. We want to be able
!    to quickly retrieve the capital city for any particular state. We
!    can solve this problem by creating two tables. The capitals table contains
!    state  capitals, then we have another table for cities that aren't capitals.
!    What happens when we want to ask for data about a city, regardless of
!    whether it is a capital or not? We can use the inheritance feature to
!    help resolve this problem for us. We define the capitals table so that
!    it inherits from cities.
  
  <programlisting>
  CREATE TABLE cities (
***************
*** 1062,1077 ****
  
     In this case, a row of capitals <firstterm>inherits</firstterm> all
     attributes (name, population, and altitude) from its parent, cities.  State
!    capitals have an extra attribute, state, that shows their state.  In
!    <productname>PostgreSQL</productname>, a table can inherit from zero or
     more other tables, and a query can reference either all rows of a table or
     all rows of a table plus all of its descendants.
- 
-    <note>
-     <para>
-      The inheritance hierarchy is actually a directed acyclic graph.
-     </para>
-    </note>
    </para>
  
    <para>
--- 1091,1102 ----
  
     In this case, a row of capitals <firstterm>inherits</firstterm> all
     attributes (name, population, and altitude) from its parent, cities.  State
!    capitals have an extra attribute, state, that shows their state.  
!   </para>
!   <para>
!    In <productname>PostgreSQL</productname>, a table can inherit from zero or
     more other tables, and a query can reference either all rows of a table or
     all rows of a table plus all of its descendants.
    </para>
  
    <para>
***************
*** 1133,1163 ****
    </para>
  </note>
    
!   <note>
!    <title>Deprecated</title> 
!    <para>
!      In previous versions of <productname>PostgreSQL</productname>, the
!      default behavior was not to include child tables in queries. This was
!      found to be error prone and is also in violation of the SQL:2003
!      standard. Under the old syntax, to get the sub-tables you append
!      <literal>*</literal> to the table name.
!      For example
! <programlisting>
! SELECT * from cities*;
! </programlisting>
!      You can still explicitly specify scanning child tables by appending
!      <literal>*</literal>, as well as explicitly specify not scanning child tables by
!      writing <quote>ONLY</quote>.  But beginning in version 7.1, the default
!      behavior for an undecorated table name is to scan its child tables
!      too, whereas before the default was not to do so.  To get the old
!      default behavior, set the configuration option
!      <literal>SQL_Inheritance</literal> to off, e.g.,
! <programlisting>
! SET SQL_Inheritance TO OFF;
! </programlisting>
!      or add a line in your <filename>postgresql.conf</filename> file.
!    </para>
!   </note>
  
    <para>
    In some cases you may wish to know which table a particular row
--- 1158,1180 ----
    </para>
  </note>
    
!   <para>
!    Inheritance does not automatically propogate data from INSERT or COPY 
!    commands to other tables in the inheritance hierarchy. Inheritance
!    does not influence these commands.
!    In our example, the following INSERT statement will fail.
! <programlisting>
! INSERT INTO cities 
! (name, population, altitude, state)
! VALUES ('New York', NULL, NULL, 'NY');
! </programlisting>         
!    We might hope that the data would be somehow routed to the capitals table,
!    though this does not happen. If the child has no locally defined
!    columns, then it is possible to route data from the parent to the child
!    using a RULE, see <xref linkend="rules-update">. 
!    This is not possible with the above INSERT statement
!    because the state column does not exist on both parent and child tables.
!   </para>
  
    <para>
    In some cases you may wish to know which table a particular row
***************
*** 1204,1220 ****
    </para>
  
    <para>
!    A table can inherit from more than one parent table, in which case it has
!    the union of the columns defined by the parent tables (plus any columns
!    declared specifically for the child table).
    </para>
  
    <para>
!    A serious limitation of the inheritance feature is that indexes (including
!    unique constraints) and foreign key constraints only apply to single
!    tables, not to their inheritance children.  This is true on both the
!    referencing and referenced sides of a foreign key constraint.  Thus,
!    in the terms of the above example:
  
     <itemizedlist>
      <listitem>
--- 1221,1287 ----
    </para>
  
    <para>
!    As shown above, a child table may locally define columns as 
!    well as inheriting them from their parents. 
!    However, polymorphism of table datatypes is not supported: a locally defined
!    column cannot override the datatype of an inherited column of the same name.
!   </para>
! 
!   <para>
!    A table can inherit from a table that has itself inherited from 
!    other tables. A table can also inherit from more than one parent 
!    table, in which case it has the union of the columns defined by the parent 
!    tables (plus any columns declared locally for the child table). 
!    Inherited columns with duplicate names and datatypes will be merged so
!    that only a single column is stored. An inheritance definition that 
!    results in a column with more than one datatype would not succeed. 
!    The full inheritance hierarchy is actually a directed acyclic graph of
!    arbitrary complexity. You should note that complex inheritance hierarchies
!    will cause a corresponding increase in parsing time for your queries. 
!    Direct access to the specific table your query requires will always
!    provide the best performance.
    </para>
  
    <para>
!    Table inheritance can currently only be defined using the 
!    <xref linkend="SQL-CREATETABLE"> statement.
!    The related statement CREATE TABLE ... AS SELECT does
!    not allow the specification of an inheritance link. The ALTER TABLE statement
!    does not currently support the INHERITS() clause. There is no
!    way to add an inheritance link to make an existing table into a child
!    table. Similarly, there is no way to remove an inheritance link from a 
!    child table, once defined, other than using DROP TABLE on the child table.
!    A parent table cannot be dropped while any of its children remain. If you 
!    wish to remove a table and all of its descendants, then you can do so 
!    using the CASCADE option of the <xref linkend="SQL-DROPTABLE"> statement.
!   </para>
! 
!   <para>
!    Check constraints can be defined on tables within an inheritance
!    hierarchy. All check constraints on a parent table are automatically
!    inherited by all of their children. Note that is currently possible to 
!    inherit mutually exclusive check constraints, but that definition quickly 
!    shows itself since all attempted row inserts will be rejected.
!   </para>
! 
!   <para>
!    <xref linkend="SQL-ALTERTABLE"> will propogate any changes in data 
!    definition on columns or check constraints down the inheritance hierarchy. 
!    Again, dropping columns or constraints on parent tables is only possible 
!    when using the CASCADE option. ALTER TABLE follows the same rules for 
!    duplicate column merging and rejection that apply during CREATE TABLE.
!   </para>
! 
!   <para>
!    Both parent and child tables can have primary and foreign keys
!    defined for them, so that they can take part normally on both the 
!    referencing and referenced sides of a foreign key constraint. Indexes
!    may be defined on any of these columns whether or not they are inherited.
!    However, a serious current limitation of the inheritance feature is that 
!    indexes (including unique constraints) and foreign key constraints only 
!    apply to single tables and do not also index their inheritance children.  
!    This is true on both sides of a foreign key constraint.  
!    Thus, in the terms of the above example:
  
     <itemizedlist>
      <listitem>
***************
*** 1236,1244 ****
        Similarly, if we were to specify that
        <structname>cities</>.<structfield>name</> <literal>REFERENCES</> some
        other table, this constraint would not automatically propagate to
!       <structname>capitals</>.  In this case you could work around it by
!       manually adding the same <literal>REFERENCES</> constraint to
!       <structname>capitals</>.
       </para>
      </listitem>
  
--- 1303,1313 ----
        Similarly, if we were to specify that
        <structname>cities</>.<structfield>name</> <literal>REFERENCES</> some
        other table, this constraint would not automatically propagate to
!       <structname>capitals</>.  However, it is possible to set up a 
!       foreign key such as <structname>capitals</>.<structfield>name</> 
!       <literal>REFERENCES</> <structname>states</>.<structfield>name</>. 
!       So it is possible to workaround this restriction by manually adding
!       foreign keys to each child table.
       </para>
      </listitem>
  
***************
*** 1251,1260 ****
--- 1320,1922 ----
      </listitem>
     </itemizedlist>
  
+       Some statements are not optimised in the same way for inherited tables
+       as they are for normal tables. Examples include:
+ 
+    <itemizedlist>
+     <listitem>
+      <para>MIN/MAX aggregates with no GROUP BY
+ <programlisting>
+ SELECT MIN(foo) FROM bar;
+ </programlisting>
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>Retrieving rows from an ordered result set using an index
+ <programlisting>
+ SELECT foo FROM bar ORDER BY foo LIMIT 5;
+ </programlisting>
+      </para>
+     </listitem>
+ 
+    </itemizedlist>
+ 
     These deficiencies will probably be fixed in some future release,
     but in the meantime considerable care is needed in deciding whether
     inheritance is useful for your problem.
+ 
    </para>
+ 
+   <note>
+    <title>Deprecated</title> 
+    <para>
+      In previous versions of <productname>PostgreSQL</productname>, the
+      default behavior was not to include child tables in queries. This was
+      found to be error prone and is also in violation of the SQL:2003
+      standard. Under the old syntax, to get the sub-tables you append
+      <literal>*</literal> to the table name.
+      For example
+ <programlisting>
+ SELECT * from cities*;
+ </programlisting>
+      You can still explicitly specify scanning child tables by appending
+      <literal>*</literal>, as well as explicitly specify not scanning child tables by
+      writing <quote>ONLY</quote>.  But beginning in version 7.1, the default
+      behavior for an undecorated table name is to scan its child tables
+      too, whereas before the default was not to do so.  To get the old
+      default behavior, set the configuration option
+      <literal>SQL_Inheritance</literal> to off, e.g.,
+ <programlisting>
+ SET SQL_Inheritance TO OFF;
+ </programlisting>
+      or add a line in your <filename>postgresql.conf</filename> file.
+    </para>
+   </note>
+ 
+   </sect1>
+ 
+   <sect1 id="ce-partitioning">
+    <title>Constraint Exclusion and Partitioning</title>
+ 
+    <indexterm>
+     <primary>partitioning</primary>
+    </indexterm>
+ 
+    <indexterm>
+     <primary>constraint exclusion</primary>
+    </indexterm>
+ 
+    <para>
+     <productname>PostgreSQL</productname> provides features required
+     to support basic table partitioning. This section describes why and how
+     you can implement this as part of your database design. 
+    </para>
+ 
+    <sect2 id="ce-partitioning-overview">
+      <title>Partitioning Overview</title>
+ 
+    <para>
+     Currently, partitioning is implemented in conjunction with table
+     inheritance only, though using fully SQL:2003 compliant syntax. 
+     Table inheritance gives us the ability to split up a table into partitions
+     and the Constraint Exclusion feature provides the ability to selectively
+     bring partitions back together in response to particular SELECT statements.
+     You should be familiar with Inheritance (see <xref linkend="ddl-inherit">)
+     before attempting to implement partitioning.
+    </para>
+ 
+    <para>
+     Partitioning can provide a number of benefits.
+    <itemizedlist>
+     <listitem>
+      <para>
+       Query performance can be improved dramatically for certain kinds
+       of query without the need for maintaining costly indexes.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Insert performance can be improved by breaking down a large
+       index into multiple pieces. When an index no longer fits easily in memory,
+       both read and write operations on the index take progressively more 
+       disk accesses.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Bulk deletes may be avoided altogether by simply removing one of the
+       partitions, if that requirement is planned into the partitioning design.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Seldom used data can be migrated to cheaper and slower storage media.
+      </para>
+     </listitem>
+ 
+    </itemizedlist>
+ 
+     The benefits will normally be worthwhile only when a data table would
+     otherwise be very large. That is for you to judge, though would not
+     usually be lower than the size of physical RAM on the database server.
+     Partitioning can be of use in any type of application. Examples range from
+     data warehousing applications with large historical data stores through
+     to line-of-business applications such as order-entry, CRM or ERP systems.
+    </para>
+ 
+    <para>
+     In &version; the following partitioning types are supported:
+    <itemizedlist>
+     <listitem>
+      <para>
+       "Range Partitioning" where the table is partitioned along a "range" 
+       defined by a single column or set of columns, with no overlap between 
+       partitions. Examples might be a date range (e.g. LogDate) or a range of
+       identifiers for particular business objects (e.g. OrderId).
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       "List Partitioning" where the table is partitioned by explicitly listing
+       which values relate to each partition. 
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Mixed range and list partitioning. No restrictions are placed on the 
+       number or complexity of the partitioning constraints.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Hash partitioning is not currently supported.
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+ 
+     <para>
+      In the next section <xref linkend="ce-partitioning-implementation"> we
+      discuss how to create the DDL required for partitioning. We then move on
+      to a discussion of the Constraint Exclusion feature which allows
+      query performance to be improved, in 
+      <xref linkend="constraint-exclusion-queries">
+     </para>
+ 
+    </sect2>
+ 
+ 
+    <sect2 id="ce-partitioning-implementation">
+      <title>Implementing Partitioning</title>
+ 
+     <para>
+      Partitioning a table is a straightforward process.  There
+      are a wide range of options for you to consider, so judging exactly
+      when and how to implement partitioning is a more complex topic. We 
+      will address that complexity primarily through the examples in this 
+      section.
+     </para>
+  
+     <para>
+      <orderedlist spacing=compact>
+       <listitem>
+        <para>
+         Create the "master" table, from which all of the partitions of that
+         will master will inherit.
+        </para>
+        <para>
+         This table will contain no data.  Do not define any 
+         constraints or keys on this table at this time, unless you intend them
+         to be applied equally to all partitions.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         Create several "child" tables using table inheritance to create
+         identical copies of the master table.
+        </para>
+        <para>
+         We will refer to the child tables as partitions, though they are in
+         every way just normal PostgreSQL data tables. It is advisable to 
+         devise a progressive naming scheme for the partitions.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         Add table constraints to define the allowed values in each partition.
+        </para>
+        <para>
+         Only clauses of the form [COLUMN] [OPERATOR] [CONSTANT(s)] will be used
+         for constraint exclusion. Simple examples would be:
+ <programlisting>
+ CHECK ( x = 1 )
+ CHECK ( county IN ('Oxfordshire','Buckinghamshire','Warwickshire'))
+ CHECK ( outletID BETWEEN 1 AND 99 )
+ </programlisting>
+ 
+         These can be linked together with boolean operators AND and OR to 
+         form complex constraints. Note that there is no difference in syntax
+         between Range and List Partitioning mechanisms; those terms are
+         descriptive only. Ensure that the set of values in each child table 
+         do not overlap.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         Add any other indexes you want on the partitions, bearing in mind
+         that it is always more efficient to add them after data has been
+         loaded into the partition.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         (optional) Define a RULE or trigger which tells the database which
+         partition in which to put new data.
+        </para>
+       </listitem>
+ 
+      </orderedlist>
+     </para>
+ 
+ 
+     <para>
+      Now lets look at the process again using a worked example of a database
+      for a large ice cream company. The company measures peak temperatures 
+      every day as well as ice cream sales in each region. They have two tables:
+   
+ <programlisting>
+ CREATE TABLE cities (
+     id              int not null,
+     name            text not null,
+     altitude        int     -- (in ft)
+ );
+ 
+ CREATE TABLE measurement (
+     city_id         int not null,
+     logdate         date not null,
+     peaktemp        int,
+     unitsales       int
+ );
+ </programlisting>
+ 
+      As time goes on, sales data is entered. Storing data costs money, 
+      yet older data has less value as ice cream fashions change, so we decide
+      to keep only 3 years data on a rolling 36 months. Each month we remove 
+      the oldest month's data.
+     </para>
+ 
+     <para>
+      Most queries just access the last week, month or quarter's data,
+      since we need to keep track of sales. As a result we have a large table,
+      yet only the most frequent 10% is accessed. Most of these queries
+      are online reports for various levels of management. These queries access
+      much of the table, so it is difficult to build enough indexes and at
+      the same time allow us to keep loading all of the data fast enough. 
+      Yet, the reports are online so we need to respond quickly.
+     </para>
+ 
+     <para>
+      In this situation we can use partitioning to help us meet all of our
+      different requirements for the measurements table.
+     </para>
+ 
+     <para>
+      (Step 1) The measurement table is our master table. 
+     </para>
+ 
+     <para>
+      (Step 2) Next we create 36 month-size partitions, using the PostgreSQL
+      inheritance feature like this:
+     
+ <programlisting>
+ CREATE TABLE measurement_yy04mm02 ( ) INHERITS (measurement);
+ CREATE TABLE measurement_yy04mm03 ( ) INHERITS (measurement);
+ ...
+ CREATE TABLE measurement_yy05mm11 ( ) INHERITS (measurement);
+ CREATE TABLE measurement_yy05mm12 ( ) INHERITS (measurement);
+ CREATE TABLE measurement_yy06mm01 ( ) INHERITS (measurement);
+ </programlisting>
+ 
+      Each of the tables are complete tables in their own right, but they 
+      inherit their definition from the measurement table.
+     </para>
+ 
+     <para>
+      This solves one of our problems: deleting older data. Each month, all
+      we need to do is perform a DROP TABLE on the oldest table and create
+      a new table to insert into. 
+     </para>
+ 
+     <para>
+      (Step 3) We now add non-overlapping table constraints, so that our table
+      creation script becomes:
+ 
+  <programlisting>
+ CREATE TABLE measurement_yy04mm02 ( 
+     CHECK ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
+                                     ) INHERITS (measurement);
+ CREATE TABLE measurement_yy04mm03 ( 
+     CHECK ( logdate >= DATE '2004-03-01' AND logdate < DATE '2004-04-01' )
+                                     ) INHERITS (measurement);
+ ...
+ CREATE TABLE measurement_yy05mm11 ( 
+     CHECK ( logdate >= DATE '2005-11-01' AND logdate < DATE '2005-12-01' )
+                                     ) INHERITS (measurement);
+ CREATE TABLE measurement_yy05mm12 ( 
+     CHECK ( logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01' )
+                                     ) INHERITS (measurement);
+ CREATE TABLE measurement_yy06mm01 ( 
+     CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )
+                                     ) INHERITS (measurement);
+ </programlisting>
+ 
+      Note that we use constants in the CHECK constraints rather than a 
+      calculation involving INTERVALS. 
+     </para>
+ 
+     <para>
+      (Step 4) We choose not to add further indices at this time.
+     </para>
+ 
+     <para>
+      (Step 5) Data will be added each day to the latest partition. This allows
+      us to set up a very simple RULE to insert data. We must 
+      redefine this each month so that it always points to the current partition.
+ 
+ <programlisting>
+ CREATE OR REPLACE RULE measurement_current_partition AS 
+ ON INSERT
+ TO measurement
+ DO INSTEAD 
+     INSERT INTO measurement_yy06mm01 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ </programlisting>
+ 
+     We might want to insert data and have the server automatically 
+     locate the partition into which the row should be added. We could do this
+     with a more complex set of RULEs as shown below.
+ 
+ <programlisting>
+ CREATE RULE measurement_insert_yy04mm02 AS 
+ ON INSERT
+ TO MEASUREMENT WHERE
+     ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
+ DO INSTEAD 
+     INSERT INTO measurement_yy04mm02 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ ...
+ CREATE RULE measurement_insert_yy05mm12 AS 
+ ON INSERT
+ TO MEASUREMENT WHERE
+     ( logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01' )
+ DO INSTEAD 
+     INSERT INTO measurement_yy05mm12 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ CREATE RULE measurement_insert_yy06mm01 AS 
+ ON INSERT
+ TO MEASUREMENT WHERE
+     ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )
+ DO INSTEAD 
+     INSERT INTO measurement_yy06mm01 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ </programlisting>
+ 
+      Note that the WHERE clauses in each RULE exactly match those used
+      for the CHECK constraints on each partition. 
+     </para>
+ 
+     <para>
+      As we can see, a complex partitioning scheme could require a substantial
+      amount of DDL. In the above example we would be creating a new partition
+      each month, so it would be advisable to create a script/program that 
+      generates the SQL DDL required automatically rather than manually
+      maintaining all of the required code.
+     </para>
+ 
+    <para>
+     The following caveats apply:
+    <itemizedlist>
+     <listitem>
+      <para>
+       There is currently no way to specify that all of the CHECK constraints
+       are mutually exclusive. Care is required by the database designer.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       There is currently no way to specify that rows may not be inserted
+       into the master table. Any CHECK constraint on the master
+       table will be inherited by all child tables, so could not be used for
+       that purpose.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       For some datatypes you must explicitly coerce the constant values
+       into the datatype of the column. The following constraint will 
+       work if x is an INTEGER datatype, but not if x is BIGINT datatype.
+ <programlisting>
+ CHECK ( x = 1)
+ </programlisting>
+       For BIGINT we must use a constraint like:
+ <programlisting>
+ CHECK ( x = 1::bigint)
+ </programlisting>
+       The issue is not restricted to BIGINT datatypes but can occur whenever
+       the default datatype of the constant does not match the datatype of
+       the column to which it is being compared.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       There is currently no way to specify that rows may not be inserted
+       into the master table. If you define a CHECK constraint on the master
+       table, then this constraint will be inherited by all child tables. All
+       queries will then return no rows.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Partitioning can also be arranged using a UNION ALL view, such as the 
+       following view. 
+ 
+ <programlisting>
+ CREATE VIEW measurement AS 
+           SELECT * FROM measurement_yy04mm02
+ UNION ALL SELECT * FROM measurement_yy04mm03
+ ...
+ UNION ALL SELECT * FROM measurement_yy05mm11
+ UNION ALL SELECT * FROM measurement_yy05mm12
+ UNION ALL SELECT * FROM measurement_yy06mm01;
+ </programlisting>
+ 
+       However, in this release this means of specifying a
+       partitioned table does not support Constraint Exclusion and is not
+       recommended.
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+ 
+ 
+    </sect2>
+ 
+    <sect2 id="constraint-exclusion-queries">
+      <title>Constraint Exclusion in Queries</title>
+ 
+    <para>
+     Partitioning can be used to improve query performance when used in 
+     conjunction with the Constraint Exclusion (CE) feature. As an example:
+ 
+ <programlisting>
+ SET constraint_exclusion=true;
+ SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
+ </programlisting>
+ 
+     Without Constraint Exclusion the above query would scan each of the
+     partitions of the measurement table. With Constraint Exclusion, the
+     planner will examine each of the constraints and try to prove that 
+     each of the partitions needs to be involved in the query. If the planner
+     is able to refute that for any partition, it excludes the partition 
+     from the query plan. 
+    </para>
+ 
+    <para>
+     You can use the EXPLAIN command to show the difference
+     between a plan with constraint_exclusion on and a plan with it off.
+ 
+ <programlisting>
+ SET constraint_exclusion=false;
+ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
+ 
+                                           QUERY PLAN
+ -----------------------------------------------------------------------------------------------
+  Aggregate  (cost=158.66..158.68 rows=1 width=0)
+    ->  Append  (cost=0.00..151.88 rows=2715 width=0)
+          ->  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+          ->  Seq Scan on measurement_yy04mm02 measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+          ->  Seq Scan on measurement_yy04mm03 measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+ ...
+          ->  Seq Scan on measurement_yy05mm12 measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+          ->  Seq Scan on measurement_yy06mm01 measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+ </programlisting>
+ 
+     Now when we enable CE, we get a significantly reduced plan, yet one that
+     returns exactly the same answer (provably).
+ 
+ <programlisting>
+ SET constraint_exclusion=true;
+ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
+                                           QUERY PLAN
+ -----------------------------------------------------------------------------------------------
+  Aggregate  (cost=63.47..63.48 rows=1 width=0)
+    ->  Append  (cost=0.00..60.75 rows=1086 width=0)
+          ->  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+          ->  Seq Scan on measurement_yy06mm01 measurement  (cost=0.00..30.38 rows=543 width=0)
+                Filter: (logdate >= '2006-01-01'::date)
+ </programlisting>
+ 
+     Don't forget that you still need to run ANALYZE on each partition
+     individually. A simple command like this
+ <programlisting>
+ ANALYZE measurement;
+ </programlisting>
+ 
+     does not ANALYZE each of the partitions, only the master table.
+    </para>
+ 
+    <para>
+     No indexes are required to make CE function. The partitions should be 
+     defined with appropriate CHECK constraints
+     and these are able to be compared against clauses on the query.
+    </para>
+ 
+    <para>
+     The following caveats apply to this release:
+    <itemizedlist>
+     <listitem>
+      <para>
+       CE only works when the query directly matches a constant. A
+       constant bound to a parameterised query will not work in the same way
+       since the plan is fixed and would need to vary with each execution.
+       Also, stable constants such as CURRENT_DATE may not be used, since
+       these are constant only for during the execution of a single query.
+       Joins conditions will not allow CE to work either.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       UPDATEs and DELETEs against the partitioning master table do 
+       not perform constraint exclusion. One of the purposes of 
+       partitioning is to avoid bulk DELETE statements so you are advised
+       to rethink your design if you attempting this.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       All constraints on all partitions of the master table are considered for
+       Constraint Exclusion, so large numbers of partitions are likely to 
+       increase query parse time considerably.
+      </para>
+     </listitem>
+ 
+    </itemizedlist>
+    </para>
+ 
+    </sect2>
+ 
   </sect1>
  
   <sect1 id="ddl-alter">

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [PATCHES] Partitioning docs

Reply via email to