Pardon my ignorance as I've never actually used partitioning before but plan to in the near future, but couldn't the grammar resemble a common WHERE clause more closely? > ... PARTITION BY RANGE(order_date) > ( > START (date '2005-12-01') end (date '2007-12-01') > EVERY(interval '2 months') > ); >
PARTITION BY RANGE(order_date) ( WHERE order_date >= '2005-12-01' AND order_date < '2007-12-01' EVERY interval '2 months' ) OR PARTITION BY RANGE(order_date) ( WHERE order_date BETWEEN '2005-12-01' AND '2007-12-01' ) Of course using '>','>=','<','<=' instead of start/end eliminates any ambiguity along with the need for INCLUSIVE/EXCLUSIVE. > ... PARTITION BY LIST (state, deptno) > ( > VALUES ('OR', 1, 'WA', 1), > VALUES ('AZ', 1, 'UT', 1, 'NM', 1), > VALUES ('OR', 2, 'WA', 2), > VALUES ('AZ', 2, 'UT', 2, 'NM', 2), > PARTITION region_null VALUES (NULL, NULL), > PARTITION region_other > ); PARTITION BY LIST (state,deptno) ( PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state in ('AZ', UT') AND deptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULL PARTITION region_other ); Do you even need to list the columns in the PARTITION BY part? PARTITION BY LIST ( PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state in ('AZ', UT') AND deptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULL PARTITION region_other ); Is there really a reason to not have a named partition as well? Sure it saves a few keystrokes, but it makes trying to do anything with them at a later date that much more difficult. Your originally suggested grammar might be shorter to type, but using WHERE clause syntax we are all familiar with seems a lot more intuitive to me on the surface at least. Why not try to reuse grammar that already exists as much as possible? On Sat, 2008-01-12 at 00:19 +0100, Gavin Sherry wrote: > CREATE TABLE is modified to accept a PARTITION BY clause. This clause > contains one or more partition declarations. The syntax is as follows: > > PARTITION BY {partition_type} (column_name[, column_name...]) > [PARTITIONS number] > ( > partition_declaration[, partition_declaration...] > > ) > List > ---- > > ... PARTITION BY LIST (state) > (PARTITION q1_northwest VALUES ('OR', 'WA'), > PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'), > PARTITION q1_northeast VALUES ('NY', 'VM', 'NJ'), > PARTITION q1_southeast VALUES ('FL', 'GA'), > PARTITION q1_northcentral VALUES ('SD', 'WI'), > PARTITION q1_southcentral VALUES ('OK', 'TX')); > > Here, we produce 6 different partitions. The first partition groups > states in the North West of the USA. We introduce here the named > partition concept for clarity. > > Range > ----- > > Range has the most expressive grammar. I'll introduce it in steps: > > ... PARTITION BY RANGE (b) > ( > PARTITION aa start (date '2007-01-01') end (date '2008-01-01'), > PARTITION bb start (date '2008-01-01') end (date '2009-01-01') > ); > > Here, we create 2 partitions: aa and bb. Partition aa has the range > 2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to > 2009-01-01. Intervals always have this problem: are the bounds included > in the range? To deal with this we define: the start of the range is > included in the range. The ending bound is not. This can be modified > with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on > a rule by rule basis. > > It is common that these partitions follow a pattern, such as following > every week, month or year. So, we support the following specification: > > ... PARTITION BY RANGE(order_date) > ( > START (date '2005-12-01') end (date '2007-12-01') > EVERY(interval '2 months') > ); > > If we like, we can mix the specification a little: > > ... PARTITION BY RANGE(order_date) > ( PARTITION Q1_2005 end (date '2005-04-01'), > PARTITION Q2_2005 end (date '2005-07-01'), > PARTITION Q3_2005 end (date '2005-10-10'), > PARTITION Q4_2005 end (date '2006-01-01'), > START (date '2006-02-01') end (date '2008-04-01') > EVERY (interval '2 weeks') > ); > > an interesting result of the flexibility of the grammar we've come up > with is that you can do something like this: > > ... PARTITION BY RANGE(order_date) > ( PARTITION minny end date '2004-12-01'), > end (date '2006-12-01'), > PARTITION maxny start (date '2006-12-01') > ); > > Here, when order_date is less than 2004-12-01, we put the data in minny, > when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed > partition and after this we put it in maxny. > > Tablespaces > ----------- > > We allow inline tablespace specification, such as: > > ... PARTITION BY RANGE(order_date) > ( > PARTITION minny TABLESPACE compress, > start (date '2004-12-01') end (date '2006-12-01') TABLESPACE hot, > PARTITION maxny TABLESPACE compress > ); > > I've used the term compress here intentionally. A number of operating > systems now ship file systems which can compress partitions. Users with > issues with the amount of data they want to keep online can delay the > time until they need new storage to a future date by compressing less > regularly used data with this technique, for a performance cost. Data > being used heavily can live on an uncompressed file system, affected. > > Multi-column support > -------------------- > > We can have multi-column partitions. > > ... PARTITION BY LIST (state, deptno) > ( > VALUES ('OR', 1, 'WA', 1), > VALUES ('AZ', 1, 'UT', 1, 'NM', 1), > VALUES ('OR', 2, 'WA', 2), > VALUES ('AZ', 2, 'UT', 2, 'NM', 2), > PARTITION region_null VALUES (NULL, NULL), > PARTITION region_other > ); > > Looking at this syntax now, I think I prefer: > > VALUES ('OR', 1),('WA', 1) > > To specify keys for the same partition. Thoughts? > > Composite partition support > --------------------------- > > Given that we're talking about systems with potentially very large > amounts of data, power users may want to combine range partitioning with > hash or list partitioning. For example, your analysis might always be on > a date range but also be broken down by sales office. So, this would > combine range and list partitioning (if the sales offices were known) or > hash partitioning (if they weren't known). > > To do this, we introduce the SUBPARTITION clause: > > ... PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id) > SUBPARTITIONS 8 > ( > start (date '2005-12-01') end (date '2007-12-01') > every (interval '3 months'), > start (date '2007-12-01') > end (date '2008-12-01') every (interval '1 month') > ); > > The first partition specification covers 8 partitions, the second 12 for > 20 partitions in total. Once we add the subpartitioning we have 160 > partitions in total (20 * 8). > > Subpartitioning by list can look like this (see templates below): > > ... PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id) > ( > partition minny (subpartition c1 values (1), subpartition > c2 values (2)), > start (date '2004-12-01') end (date '2006-12-01') > (subpartition c1 values (1), subpartition c2 > values (2)), > partition maxy (values (1), values (2) > ) > > So, the list parameters of each sub partition look like arguments to the > primary partition. Again, see templates below if you think this looks > cumbersome. > > We do not preclude subpartitions of subpartitions. So, the following is > valid: > > ... PARTITION BY HASH(b) > PARTITIONS 2 > SUBPARTITION BY HASH(d) > SUBPARTITIONS 2, > SUBPARTITION BY HASH(e) SUBPARTITIONS 2, > SUBPARTITION BY HASH(f) SUBPARTITIONS 2, > SUBPARTITION BY HASH(g) SUBPARTITIONS 2, > SUBPARTITION BY HASH(h) SUBPARTITIONS 2; > > Subpartition templates > ---------------------- > > There are times we want subpartitions to be laid out in a specific way > for all partitions. To do this, we use templates: > > ... PARTITION BY RANGE (order_date) > SUBPARTITION BY LIST (state) > SUBPARTITION TEMPLATE > ( > SUBPARTITION northwest VALUES ('OR', 'WA'), > SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'), > SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ'), > SUBPARTITION southeast VALUES ('FL', 'GA'), > SUBPARTITION northcentral VALUES ('SD', 'WI'), > SUBPARTITION southcentral VALUES ('OK', 'TX') > ) > (start (date '2001-01-01') end (date '2010-01-01') > every (interval '3 months') > ) > > For each of the 36 odd partitions we create here, each is subpartitioned > into geographical areas. > > Data management with ALTER > -------------------------- > > These are all arguments to ALTER TABLE. All of these require validation > against the existing specification. > > ADD > --- > > For range and list partitioning, it's important to be able to add > partitions for data not covered by the existing specification. So, we > propose: > > ... ADD PARTITION q1_2008 end (date '2008-04-01') > > COALESCE (maybe) > ---------------- > > For hash partitions, remove a partition from the number of hash > partitions and distribute its data to the remaining partitions. > > ... COALESCE PARTITION [name]; > > I'm not sure if this is really used but other systems we looked at have > it. Thoughts? > > DROP > ---- > > For list and range partitions, drop a specified partition from the set > of partitions. > > ... DROP PARTITION minny; > > This drops a named partition. Often, it will be difficult for users to > know partition names, and they might be unnamed. So, we allow this > syntax: > > ... DROP PARTITION FOR(date '2007-01-01'); > > for range partitions; and: > > ... DROP PARTITION FOR(VALUES('CA')); > > for list partitions. > > We've also discussed something like: > > ... DROP PARTITION FOR(POSITION(1)); > > so that users can easily drop a specific partition in an array of range > partitions. It seems to me, though, that the use case is generally to > drop the oldest partition so perhaps we should have a more explicit > syntax. Thoughts? > > EXCHANGE > -------- > > This sub-clause allows us to make a table a partition in a set of > partitions or take a partition out of a set but keep it as a table. IBM > uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll > explain the latter: > > ... EXCHANGE <partition identifier> WITH TABLE <table name> > > partition identifier is one of PARTITION <name> or PARTITION FOR(...). > The partition in the partition set 'becomes' the table <table name> and > vice-versa. Essentially, we'd swap the relfilenodes. This means that we > have to first ADD PARTITION then swap the table and the partition. > Thoughts? > > MERGE > ----- > > You can merge and list partitions and any two range partitions: > > ... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>] > > For range partitions: > > ... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01'); > > For list partitions: > > ... MERGE PARTITION FOR(VALUES('CA', 'MA') > > This begs the question of why we have COALESCE for hash partitioning. I > don't know, it just seems like the right thing since you can't merge two > hash partitions together (well, you shouldn't want to). > > RENAME > ------ > > Rename a partition. We can use partition name or FOR clause. > > SPLIT > ----- > > Split is used to divide a partition in two. It is designed for list and > range partitioning but I guess we could/should support hash. I need to > think about that. For RANGE partitions: > > ... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>, > PARTITION <partition name2>)]; > > AT clause specifies the point at which the partition is split in two: > > ... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000, > part2000) > > We might want ways to do this with unnamed partitions, it seems to me. > Thoughts? > > For list: > > ... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') ) > INTO > ( > PARTITION region_east_1, > PARTITION region_east_2 > ); > > In this case, values from region_east specified in the AT() list are put in > region_east_1 and the rest are put in region_east_2. > > I think a better way for supporting split with hash is via ADD. I'm sure > some people think that ugly so I'd like feedback. > > TRUNCATE > -------- > > Truncate a specified partition: > > ... TRUNCATE PARTITION FOR ('2005-01-01') > > We could specify a name too. > > This will use truncate internally. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Mike <[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part