Re: [GENERAL] Table partitioning
Everything is slow about it - selects, deletes and inserts, that is. I don't do updates on that table. The inserts and deletes are less of an issue because they are done once a week. Of course it would be nicer if they were faster, but that's less of an issue. The real issue is with self-joins, which are a common query. But I have indexes on the relevant fields: the connecting field (the one used for the self-join) as well as the date field. The queries are mostly of the format SELECT ... FROM tablename t1 left join tablename t2 on t1.indexed_field = t2.indexed_field and t2.boolean_field where t1.date_field >= 'date1' and t1.date_field < 'date2' and not t1.boolean; This can take about 15 minutes to run, depending on the date range. And it doesn't matter much if I put a date range within the left join, either. As I mentioned in another message, I don't like the idea that my script may fail, and then inserts into the table might start to fail as well. I'm not always available to run it manually within a set time. And writing an automation that is different than all the other tables I maintain in that database makes for maintenance spaghetti. I also don't like running automated DDL commands. They don't play well with backups. -הודעה מקורית- מאת: Steve Crawford [mailto:scrawf...@pinpointresearch.com] נשלח: ב 28/10/2013 22:31 אל: Herouth Maoz; pgsql-general@postgresql.org נושא: Re: [GENERAL] Table partitioning On 10/28/2013 09:27 AM, Herouth Maoz wrote: > I have a rather large and slow table in Postgresql 9.1. I'm thinking of > partitioning it by months, but I don't like the idea of creating and dropping > tables all the time. What is slow about it? Inserting? Selecting? Deleting? Partitioning can assist with some issues but does no good if what you really need is an index or better query. Partitioning shines as an option to manage archiving/purging of time-series data but only if you work with it, not against it. What don't you like about creating and dropping tables? You can easily automate it: https://github.com/keithf4/pg_partman > > I'm thinking of simply creating 12 child tables, in which the check condition > will be, for example, date_part('month'', time_arrived) = 1 (or 2 for > February, 3 for March etc.). > > I'll just be deleting records rather than dropping tables, the same way I do > in my current setup. I delete a week's worth every time. You are missing out on one of the best aspects of partitioning. Compared to dropping or truncating a child table, deleting is far slower and causes table bloat which may impact future queries. > > Second, when I delete (not drop!) from the mother table, are records deleted > automatically from the child tables or do I need to create rules/triggers for > that? > Yes unless you use the keyword "ONLY": "If specified, deletes rows from the named table only. When not specified, any tables inheriting from the named table are also processed." Cheers, Steve
Re: [GENERAL] Table partitioning
On 10/28/2013 09:27 AM, Herouth Maoz wrote: I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the idea of creating and dropping tables all the time. What is slow about it? Inserting? Selecting? Deleting? Partitioning can assist with some issues but does no good if what you really need is an index or better query. Partitioning shines as an option to manage archiving/purging of time-series data but only if you work with it, not against it. What don't you like about creating and dropping tables? You can easily automate it: https://github.com/keithf4/pg_partman I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 3 for March etc.). I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worth every time. You are missing out on one of the best aspects of partitioning. Compared to dropping or truncating a child table, deleting is far slower and causes table bloat which may impact future queries. Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do I need to create rules/triggers for that? Yes unless you use the keyword "ONLY": "If specified, deletes rows from the named table only. When not specified, any tables inheriting from the named table are also processed." Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table partitioning
On 2013-10-28 12:47, Herouth Maoz wrote: Thanks. Assuming there is an index on the time_arrived column, and that there are about 10.5 million records in each child table, how bad will performance be if the query actually accesses all the 12 tables? Will it be as bad as using the full table? In the case where you don't add a predicate to your select statement that matches your constraint checks, meaning your statements are visiting every child table, you're not going to experience any performance gains over leaving everything in one table (plus you're adding on extra maintenance overhead and likely degrading query performance, even if only by a small amount). What are your concerns with creating/dropping tables? You may have even better success with creating a child table for each week. Fifty-two extra tables is extra maintenance but the performance boost you could get by dividing out your data is probably significant, and that kind of maintenance should be automated anyway. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table partitioning
Thanks. Assuming there is an index on the time_arrived column, and that there are about 10.5 million records in each child table, how bad will performance be if the query actually accesses all the 12 tables? Will it be as bad as using the full table? On 28/10/2013, at 18:31, Elliot wrote: > On 2013-10-28 12:27, Herouth Maoz wrote: >> I have a rather large and slow table in Postgresql 9.1. I'm thinking of >> partitioning it by months, but I don't like the idea of creating and >> dropping tables all the time. >> >> I'm thinking of simply creating 12 child tables, in which the check >> condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 >> for February, 3 for March etc.). >> >> I'll just be deleting records rather than dropping tables, the same way I do >> in my current setup. I delete a week's worth every time. >> >> So, I have two questions. >> >> First, is constraint exclusion going to work with that kind of condition? I >> mean, if my WHERE clause says something like "time_arrived >= '2013-04-05' >> and time_arrived < '2013-04-17'", will it be able to tell that >> date_part("month",time_arrived) for all the records is 4, and therefore >> avoid selecting from any partitions other than the april one? >> >> Second, when I delete (not drop!) from the mother table, are records deleted >> automatically from the child tables or do I need to create rules/triggers >> for that? >> >> >> TIA, >> Herouth >> > 1. No - you'd need a condition like "where date_part("month", time_arrived) = > 1" in your select statements in order for the constraint exclusion to kick in > 2. Yes - there is no need to create rules or triggers for deletes on the > parent table (check out the syntax for "delete from " versus "delete > from only ) > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table partitioning
On 2013-10-28 12:27, Herouth Maoz wrote: I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the idea of creating and dropping tables all the time. I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 3 for March etc.). I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worth every time. So, I have two questions. First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something like "time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that date_part("month",time_arrived) for all the records is 4, and therefore avoid selecting from any partitions other than the april one? Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do I need to create rules/triggers for that? TIA, Herouth 1. No - you'd need a condition like "where date_part("month", time_arrived) = 1" in your select statements in order for the constraint exclusion to kick in 2. Yes - there is no need to create rules or triggers for deletes on the parent table (check out the syntax for "delete from " versus "delete from only ) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Table partitioning
I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the idea of creating and dropping tables all the time. I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 3 for March etc.). I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worth every time. So, I have two questions. First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something like "time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that date_part("month",time_arrived) for all the records is 4, and therefore avoid selecting from any partitions other than the april one? Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do I need to create rules/triggers for that? TIA, Herouth -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning
So I worked around most of my errors. I removed the bigserial and used two of the columns as the primary key. I am now getting the following hibernate exception back: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 This appears to be caused by the fact that the function is not returning back the row count. I did a google search and found a few suggestions on how to resolve this issue, but they do not seem to work well. I tried returning NEW, but that seems to cause the engine to also insert the record in the base table as well as a partition. Thus I end up with 120 records when I am expecting just 60. Any ideas on how I can fix this issue? Regards, Richard From: Richard Onorato To: Raghavendra Cc: "pgsql-general@postgresql.org" Sent: Wednesday, May 22, 2013 7:27 PM Subject: Re: [GENERAL] Table Partitioning Raghavendra, I am doing my inserts via Java JPA statements embedded in my Data Access Layer. I can share them if you would like to see them. Regards, Richard From: Raghavendra To: Richard Onorato Cc: "pgsql-general@postgresql.org" Sent: Wednesday, May 22, 2013 2:39 AM Subject: Re: [GENERAL] Table Partitioning On Wed, May 22, 2013 at 6:54 AM, Richard Onorato wrote: Were you able to get it to insert with the bigserial being used on the table? Yes. Every time I go to do an insert into one of the inherited tables I am now getting the following exception: >org.hibernate.HibernateException: The database returned no natively generated >identity value > > Hmm, I guess you are inserting on the parent table not directly into inherited table. Can you share the INSERT statement. Is auto-increment supported on table partitioning? > Yes, BIGSERIAL will create a sequence that will be shared by all child partitions. Check below example as per your test case, INSERT statement do not have BIGSERIAL column still its auto-increment and populated data in child tables. postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values (9,20,30,1,now()); INSERT 0 0 postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values (7,20,30,1,now()); INSERT 0 0 postgres=# select * from mymappingtablet5; id | c1 | c2 | c3 | count | createdtime ++++---+-- 8 | 9 | 20 | 30 | 1 | 2013-05-18 02:08:33.061548+05:30 postgres=# select * from mymappingtablet3; id | c1 | c2 | c3 | count | createdtime ++++---+-- 9 | 7 | 20 | 30 | 1 | 2013-05-18 02:12:03.076529+05:30 (1 row) (Request not to top-post please ... :) ) --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
Re: [GENERAL] Table Partitioning
Raghavendra, I am doing my inserts via Java JPA statements embedded in my Data Access Layer. I can share them if you would like to see them. Regards, Richard From: Raghavendra To: Richard Onorato Cc: "pgsql-general@postgresql.org" Sent: Wednesday, May 22, 2013 2:39 AM Subject: Re: [GENERAL] Table Partitioning On Wed, May 22, 2013 at 6:54 AM, Richard Onorato wrote: Were you able to get it to insert with the bigserial being used on the table? Yes. Every time I go to do an insert into one of the inherited tables I am now getting the following exception: >org.hibernate.HibernateException: The database returned no natively generated >identity value > > Hmm, I guess you are inserting on the parent table not directly into inherited table. Can you share the INSERT statement. Is auto-increment supported on table partitioning? > Yes, BIGSERIAL will create a sequence that will be shared by all child partitions. Check below example as per your test case, INSERT statement do not have BIGSERIAL column still its auto-increment and populated data in child tables. postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values (9,20,30,1,now()); INSERT 0 0 postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values (7,20,30,1,now()); INSERT 0 0 postgres=# select * from mymappingtablet5; id | c1 | c2 | c3 | count | createdtime ++++---+-- 8 | 9 | 20 | 30 | 1 | 2013-05-18 02:08:33.061548+05:30 postgres=# select * from mymappingtablet3; id | c1 | c2 | c3 | count | createdtime ++++---+-- 9 | 7 | 20 | 30 | 1 | 2013-05-18 02:12:03.076529+05:30 (1 row) (Request not to top-post please ... :) ) --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
Re: [GENERAL] Table Partitioning
On Wed, May 22, 2013 at 6:54 AM, Richard Onorato wrote: > Were you able to get it to insert with the bigserial being used on the > table? > Yes. > Every time I go to do an insert into one of the inherited tables I am now > getting the following exception: > org.hibernate.HibernateException: The database returned no natively > generated identity value > > Hmm, I guess you are inserting on the parent table not directly into inherited table. Can you share the INSERT statement. > Is auto-increment supported on table partitioning? > > Yes, BIGSERIAL will create a sequence that will be shared by all child partitions. Check below example as per your test case, INSERT statement do not have BIGSERIAL column still its auto-increment and populated data in child tables. postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values (9,20,30,1,now()); INSERT 0 0 postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values (7,20,30,1,now()); INSERT 0 0 postgres=# select * from mymappingtablet5; id | c1 | c2 | c3 | count | createdtime ++++---+-- 8 | 9 | 20 | 30 | 1 | 2013-05-18 02:08:33.061548+05:30 postgres=# select * from mymappingtablet3; id | c1 | c2 | c3 | count | createdtime ++++---+-- 9 | 7 | 20 | 30 | 1 | 2013-05-18 02:12:03.076529+05:30 (1 row) (Request not to top-post please ... :) ) --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
Re: [GENERAL] Table Partitioning
On Tuesday, May 21, 2013, Richard Onorato wrote: > I am wanting to partition my data based on a mod of one of the bigint > columns, but when I run my insert test all the data goes into the base > table and not the partitions. Here is what the table looks like: > This works for me, on 9.2.4. Can you show exactly how you do the inserts? CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) > INHERITS (MyMappingTable); > > This is not going to work well. constraint_exclusion does not understand the modulus operator. Cheers, Jeff
Re: [GENERAL] Table Partitioning
Were you able to get it to insert with the bigserial being used on the table? Every time I go to do an insert into one of the inherited tables I am now getting the following exception: org.hibernate.HibernateException: The database returned no natively generated identity value Is auto-increment supported on table partitioning? Regards, Richard From: Raghavendra To: Richard Onorato Cc: "pgsql-general@postgresql.org" Sent: Tuesday, May 21, 2013 1:06 PM Subject: Re: [GENERAL] Table Partitioning On Tue, May 21, 2013 at 11:03 PM, Richard Onorato wrote: I am wanting to partition my data based on a mod of one of the bigint columns, but when I run my insert test all the data goes into the base table and not the partitions. Here is what the table looks like: > > >CREATE table MyMappingTable ( id bigserial NOT NULL, > c1 bigInt NOT NULL, > c2 bigInt NOT NULL, > c3 bigint NOT NULL, > count bigint DEFAULT 1, > createdTime timestamp with time zone >default CURRENT_TIMESTAMP, > CONSTRAINT MyMappingTable_index PRIMARY >KEY (id) ) >with (OIDS=FALSE); > > >CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) INHERITS >(MyMappingTable); >CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) INHERITS >(MyMappingTable); >CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) INHERITS >(MyMappingTable); >CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) INHERITS >(MyMappingTable); >CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) INHERITS >(MyMappingTable); > > >Here is the trigger function that I added to the database: > > >CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger() >RETURNS trigger AS $$ >BEGIN > IF ( (NEW.c1 % 5) = 0 ) THEN > INSERT INTO MyMappingTableT1 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 1 ) THEN > INSERT INTO MyMappingTableT2 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 2 ) THEN > INSERT INTO MyMappingTableT3 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 3 ) THEN > INSERT INTO MyMappingTableT4 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 4 ) THEN > INSERT INTO MyMappingTableT5 VALUES (NEW.*); > ELSE > > RAISE EXCEPTION 'c1 mod out of range. Something wrong with the >my_mapping_table_insert_trigger() function!'; > END IF; > RETURN NULL; >END; >$$ >LANGUAGE plpgsql; > > >Here is the Trigger that I added to the table: > > >CREATE TRIGGER insert_my_mapping_table_trigger > BEFORE INSERT ON MyMappingTable > FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger(); > > >SET constraint_exclusion = ON; > > > >Regards, > > >Richard I tried your test case, its working fine from my end and populating data properly to partition childs. insert into mymappingtable values (1,7,20,30,1,now()); insert into mymappingtable values (2,6,20,30,1,now()); insert into mymappingtable values (3,8,20,30,1,now()); insert into mymappingtable values (4,9,20,30,1,now()); insert into mymappingtable values (5,10,20,30,1,now()); postgres=# \dt+ MyMappingTable* List of relations Schema | Name | Type | Owner | Size | Description +--+---+--++- public | mymappingtable | table | postgres | 0 bytes | public | mymappingtablet1 | table | postgres | 8192 bytes | public | mymappingtablet2 | table | postgres | 8192 bytes | public | mymappingtablet3 | table | postgres | 8192 bytes | public | mymappingtablet4 | table | postgres | 8192 bytes | public | mymappingtablet5 | table | postgres | 8192 bytes | --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
Re: [GENERAL] Table Partitioning
Interesting. I wonder what I am doing wrong. I will try and setup the database again and see if I can get it to work. thank you for testing it out for me. Richard On May 21, 2013, at 1:06 PM, Raghavendra wrote: > On Tue, May 21, 2013 at 11:03 PM, Richard Onorato > wrote: > I am wanting to partition my data based on a mod of one of the bigint > columns, but when I run my insert test all the data goes into the base table > and not the partitions. Here is what the table looks like: > > CREATE table MyMappingTable ( id bigserial NOT NULL, > c1 bigInt NOT NULL, > c2 bigInt NOT NULL, > c3 bigint NOT NULL, > count bigint DEFAULT 1, > createdTime timestamp with time zone > default CURRENT_TIMESTAMP, > CONSTRAINT MyMappingTable_index PRIMARY > KEY (id) ) > with (OIDS=FALSE); > > CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) > INHERITS (MyMappingTable); > > Here is the trigger function that I added to the database: > > CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger() > RETURNS trigger AS $$ > BEGIN > IF ( (NEW.c1 % 5) = 0 ) THEN > INSERT INTO MyMappingTableT1 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 1 ) THEN > INSERT INTO MyMappingTableT2 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 2 ) THEN > INSERT INTO MyMappingTableT3 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 3 ) THEN > INSERT INTO MyMappingTableT4 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 4 ) THEN > INSERT INTO MyMappingTableT5 VALUES (NEW.*); > ELSE > RAISE EXCEPTION 'c1 mod out of range. Something wrong with the > my_mapping_table_insert_trigger() function!'; > END IF; > RETURN NULL; > END; > $$ > LANGUAGE plpgsql; > > Here is the Trigger that I added to the table: > > CREATE TRIGGER insert_my_mapping_table_trigger > BEFORE INSERT ON MyMappingTable > FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger(); > > SET constraint_exclusion = ON; > > Regards, > > Richard > > I tried your test case, its working fine from my end and populating data > properly to partition childs. > > insert into mymappingtable values (1,7,20,30,1,now()); > insert into mymappingtable values (2,6,20,30,1,now()); > insert into mymappingtable values (3,8,20,30,1,now()); > insert into mymappingtable values (4,9,20,30,1,now()); > insert into mymappingtable values (5,10,20,30,1,now()); > > postgres=# \dt+ MyMappingTable* > List of relations > Schema | Name | Type | Owner |Size| Description > +--+---+--++- > public | mymappingtable | table | postgres | 0 bytes| > public | mymappingtablet1 | table | postgres | 8192 bytes | > public | mymappingtablet2 | table | postgres | 8192 bytes | > public | mymappingtablet3 | table | postgres | 8192 bytes | > public | mymappingtablet4 | table | postgres | 8192 bytes | > public | mymappingtablet5 | table | postgres | 8192 bytes | > > --- > Regards, > Raghavendra > EnterpriseDB Corporation > Blog: http://raghavt.blogspot.com/ >
Re: [GENERAL] Table Partitioning
On Tue, May 21, 2013 at 11:03 PM, Richard Onorato wrote: > I am wanting to partition my data based on a mod of one of the bigint > columns, but when I run my insert test all the data goes into the base > table and not the partitions. Here is what the table looks like: > > CREATE table MyMappingTable ( id bigserial NOT NULL, > c1 bigInt NOT NULL, > c2 bigInt NOT NULL, > c3 bigint NOT NULL, > count bigint DEFAULT 1, > createdTime timestamp with time zone > default CURRENT_TIMESTAMP, > CONSTRAINT MyMappingTable_index > PRIMARY KEY (id) ) > with (OIDS=FALSE); > > CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) > INHERITS (MyMappingTable); > CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) > INHERITS (MyMappingTable); > > Here is the trigger function that I added to the database: > > CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger() > RETURNS trigger AS $$ > BEGIN > IF ( (NEW.c1 % 5) = 0 ) THEN > INSERT INTO MyMappingTableT1 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 1 ) THEN > INSERT INTO MyMappingTableT2 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 2 ) THEN > INSERT INTO MyMappingTableT3 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 3 ) THEN > INSERT INTO MyMappingTableT4 VALUES (NEW.*); > ELSIF ( (NEW.c1 % 5) = 4 ) THEN > INSERT INTO MyMappingTableT5 VALUES (NEW.*); > ELSE > RAISE EXCEPTION 'c1 mod out of range. Something wrong with > the my_mapping_table_insert_trigger() function!'; > END IF; > RETURN NULL; > END; > $$ > LANGUAGE plpgsql; > > Here is the Trigger that I added to the table: > > CREATE TRIGGER insert_my_mapping_table_trigger > BEFORE INSERT ON MyMappingTable > FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger(); > > SET constraint_exclusion = ON; > > Regards, > > Richard > I tried your test case, its working fine from my end and populating data properly to partition childs. insert into mymappingtable values (1,7,20,30,1,now()); insert into mymappingtable values (2,6,20,30,1,now()); insert into mymappingtable values (3,8,20,30,1,now()); insert into mymappingtable values (4,9,20,30,1,now()); insert into mymappingtable values (5,10,20,30,1,now()); postgres=# \dt+ MyMappingTable* List of relations Schema | Name | Type | Owner |Size| Description +--+---+--++- public | mymappingtable | table | postgres | 0 bytes| public | mymappingtablet1 | table | postgres | 8192 bytes | public | mymappingtablet2 | table | postgres | 8192 bytes | public | mymappingtablet3 | table | postgres | 8192 bytes | public | mymappingtablet4 | table | postgres | 8192 bytes | public | mymappingtablet5 | table | postgres | 8192 bytes | --- Regards, Raghavendra EnterpriseDB Corporation Blog: http://raghavt.blogspot.com/
[GENERAL] Table Partitioning
I am wanting to partition my data based on a mod of one of the bigint columns, but when I run my insert test all the data goes into the base table and not the partitions. Here is what the table looks like: CREATE table MyMappingTable ( id bigserial NOT NULL, c1 bigInt NOT NULL, c2 bigInt NOT NULL, c3 bigint NOT NULL, count bigint DEFAULT 1, createdTime timestamp with time zone default CURRENT_TIMESTAMP, CONSTRAINT MyMappingTable_index PRIMARY KEY (id) ) with (OIDS=FALSE); CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) INHERITS (MyMappingTable); CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) INHERITS (MyMappingTable); CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) INHERITS (MyMappingTable); CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) INHERITS (MyMappingTable); CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) INHERITS (MyMappingTable); Here is the trigger function that I added to the database: CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger() RETURNS trigger AS $$ BEGIN IF ( (NEW.c1 % 5) = 0 ) THEN INSERT INTO MyMappingTableT1 VALUES (NEW.*); ELSIF ( (NEW.c1 % 5) = 1 ) THEN INSERT INTO MyMappingTableT2 VALUES (NEW.*); ELSIF ( (NEW.c1 % 5) = 2 ) THEN INSERT INTO MyMappingTableT3 VALUES (NEW.*); ELSIF ( (NEW.c1 % 5) = 3 ) THEN INSERT INTO MyMappingTableT4 VALUES (NEW.*); ELSIF ( (NEW.c1 % 5) = 4 ) THEN INSERT INTO MyMappingTableT5 VALUES (NEW.*); ELSE RAISE EXCEPTION 'c1 mod out of range. Something wrong with the my_mapping_table_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql; Here is the Trigger that I added to the table: CREATE TRIGGER insert_my_mapping_table_trigger BEFORE INSERT ON MyMappingTable FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger(); SET constraint_exclusion = ON; Regards, Richard
Re: [GENERAL] Table Partitioning Advice Request
2009/12/18 Scott Marlowe : > The main reason to avoid rules is that they're much less efficient > than triggers. We use partitioning at work for our stats db, and > partition by day, and we have about 2 years worth of stats data, so > our insert trigger has about 700 if / elseif / else conditions in it, > and the trigger is still very very fast. I tried it with rules before > and it was way too slow. I still don't understand the poor efficiency of the rule system. I presume it's in the implementation, not the concept. Well, for the sake of efficiency I would like to avoid that if/elseif chain with an extra support table to look for the right partition. Unless you have a different advise against dynamic SQL (EXECUTE...) to define the partition table name. > Note that the absolute fastest way to insert to the right child table > is to do it from the application side, choosing the right child table > there. I definitely disagree on this advise. For the sake of speed I could also run some library level (and thus application level) database solution. The main objective for using an RDBMS (at least in my view) is to gather in a single place (the RDBMS) the data storage and management stuff. Moving policies to the application level would just scatter that stuff with faster but more error prone solutions. And the aplpication needs to know the partitioning policies. > We automated our trigger creation. While the trigger is huge in terms > of number of lines, speed wise the creation of the new trigger each > night at midnight is measured in milliseconds Correct. There is still another point for me to investigate into this partitioning area. When the partitions get created. I see from your answers that everyone is expecting the partitions to be already in place at the moment the trigger or the rule get fired. In my application you can generate events for the past or the future where there's no partition yet. -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
On Thu, Dec 17, 2009 at 8:29 AM, Vincenzo Romano wrote: > 2009/12/17 Sam Jas >> >> Rule is not advisable, Trigger is the best solution. > > Does the trigger solution need the TABLE CHECK constraint? > It looks to me it won't. The table check constraint is used during selects to pick just the child tables needed to deliver the data you are asking for, assuming the key value is in the where clause. The main reason to avoid rules is that they're much less efficient than triggers. We use partitioning at work for our stats db, and partition by day, and we have about 2 years worth of stats data, so our insert trigger has about 700 if / elseif / else conditions in it, and the trigger is still very very fast. I tried it with rules before and it was way too slow. Note that the absolute fastest way to insert to the right child table is to do it from the application side, choosing the right child table there. We automated our trigger creation. While the trigger is huge in terms of number of lines, speed wise the creation of the new trigger each night at midnight is measured in milliseconds -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
2009/12/17 Dimitri Fontaine : > Vincenzo Romano writes: >> Is there any performance study for the trigger-based implementation? > > Consider that if you use RULE to partition, when you DROP a partition > the INSERTs are locked out because the query depends on the table being > droped. > > That alone could lead you to stop considering RULEs for partitioning. In that case I would also drop the relevant rule(s). But it seems anyway that the rule system would need to scan all the rules in order to know which one(s) to apply. And, at least in my case, I would have hundreds of rules ... Thanks for the hint, anyway. -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
Vincenzo Romano writes: > Is there any performance study for the trigger-based implementation? Consider that if you use RULE to partition, when you DROP a partition the INSERTs are locked out because the query depends on the table being droped. That alone could lead you to stop considering RULEs for partitioning. -- dim -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
2009/12/17 Sam Jas > > Rule is not advisable, Trigger is the best solution. Does the trigger solution need the TABLE CHECK constraint? It looks to me it won't. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
2009/12/17 David Fetter : >> You really think that calling and running a trigger function for every >> line is the best solution? > > Yes. The trigger function is choosing from a small subset of the > tables, or you know which tables exactly the rows are going into and > insert them there. So I understand that when a table has multiple RULEs, the planner just browse all of them to check the one(s) that will apply. What I have in mind is a "side table" with some bookkeeping data to be used by the trigger function to select the actual table to be used. Do you think this is faster than RULEs? Is there any performance study for the trigger-based implementation? > Oh, and please do trim, and don't top-post. I've fix this in this > post. Sorry! :-) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
On Thu, Dec 17, 2009 at 02:41:40PM +0100, Vincenzo Romano wrote: > > 2009/12/17 Sam Jas > > Rule is not advisable, Trigger is the best solution. > If you have to choose among a couple hundred partition tables, the > trigger function body is far from trivial! It's possible to generate such code and deploy it automatically. > You really think that calling and running a trigger function for every > line is the best solution? Yes. The trigger function is choosing from a small subset of the tables, or you know which tables exactly the rows are going into and insert them there. Oh, and please do trim, and don't top-post. I've fix this in this post. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
Why? If you have to choose among a couple hundred partition tables, the trigger function body is far from trivial! You really think that calling and running a trigger function for every line is the best solution? 2009/12/17 Sam Jas > > Rule is not advisable, Trigger is the best solution. > > --- > Thanks > Sam Jas > > --- On Thu, 17/12/09, Vincenzo Romano wrote: > > From: Vincenzo Romano > Subject: [GENERAL] Table Partitioning Advice Request > To: pgsql-general@postgresql.org > Date: Thursday, 17 December, 2009, 11:05 AM > > Hi all. > > I'm planning to implement table partitioning as "suggested" (among > other sources) in the official documentation. > I'm using v8.4.2 at the moment. > > My case is far from the general one as: > 1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs) > 2. Rows will be inserted one-by-one or, in the worse case, in bunches > of two or three > 3. Partitioning will be based upon TIMESTAMP ranges > 4. The "virtual" tables should approach (and possibly go past) 100M rows > 5. Most (99%) of the INSERTs (and possibly SELECTs) will actually > operate on a rather small number of partitions (hardly more than 2). > > My main TABLE is like the following one: > > CREATE TABLE events ( > eventtype text not null, > item_id int8 not null, > event_date timestamp not null default now(), > row_date timestamp not null default now(), > event_id serial8 primary key > ); > > where the partitioning would happen over the values of the event_date column. > The row_date columns is to record the row creation TIMESTAMP as events > can be created relatively to the past, the future or the current time. > > In my mind a solution which is simple to maintain is to add a simple > RULE ... ON INSERT for every newly created partition table. > The TRIGGER approach, in my opinion, is much more complex to maintain > as either the body of the function needs to be rewritten as new > partitions are added, or > some external TABLE lookup is needed to choose the actual table name > to be used for a (dynamically created) INSERT. > > Now the questions. > > 1. As the number of RULEs will grow with the time, how will change > the efficiency of the query planner while "browsing" among the RULES? > 2. In the case the previous answer would lead to bad news for me, is > there any better/different approach to partitioning with TRIGGERs? > 3. Is there any more general advise for such approaches? > > -- > Vincenzo Romano > NON QVIETIS MARIBVS NAVTA PERITVS > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. -- Vincenzo Romano NotOrAnd Information Technologies cel. +39 339 8083886 | gtalk. vincenzo.rom...@notorand.it fix. +39 0823 454163 | skype. notorand.it fax. +39 02 700506964 | msn. notorand.it NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning Advice Request
Rule is not advisable, Trigger is the best solution. --- Thanks Sam Jas --- On Thu, 17/12/09, Vincenzo Romano wrote: From: Vincenzo Romano Subject: [GENERAL] Table Partitioning Advice Request To: pgsql-general@postgresql.org Date: Thursday, 17 December, 2009, 11:05 AM Hi all. I'm planning to implement table partitioning as "suggested" (among other sources) in the official documentation. I'm using v8.4.2 at the moment. My case is far from the general one as: 1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs) 2. Rows will be inserted one-by-one or, in the worse case, in bunches of two or three 3. Partitioning will be based upon TIMESTAMP ranges 4. The "virtual" tables should approach (and possibly go past) 100M rows 5. Most (99%) of the INSERTs (and possibly SELECTs) will actually operate on a rather small number of partitions (hardly more than 2). My main TABLE is like the following one: CREATE TABLE events ( eventtype text not null, item_id int8 not null, event_date timestamp not null default now(), row_date timestamp not null default now(), event_id serial8 primary key ); where the partitioning would happen over the values of the event_date column. The row_date columns is to record the row creation TIMESTAMP as events can be created relatively to the past, the future or the current time. In my mind a solution which is simple to maintain is to add a simple RULE ... ON INSERT for every newly created partition table. The TRIGGER approach, in my opinion, is much more complex to maintain as either the body of the function needs to be rewritten as new partitions are added, or some external TABLE lookup is needed to choose the actual table name to be used for a (dynamically created) INSERT. Now the questions. 1. As the number of RULEs will grow with the time, how will change the efficiency of the query planner while "browsing" among the RULES? 2. In the case the previous answer would lead to bad news for me, is there any better/different approach to partitioning with TRIGGERs? 3. Is there any more general advise for such approaches? -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/
[GENERAL] Table Partitioning Advice Request
Hi all. I'm planning to implement table partitioning as "suggested" (among other sources) in the official documentation. I'm using v8.4.2 at the moment. My case is far from the general one as: 1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs) 2. Rows will be inserted one-by-one or, in the worse case, in bunches of two or three 3. Partitioning will be based upon TIMESTAMP ranges 4. The "virtual" tables should approach (and possibly go past) 100M rows 5. Most (99%) of the INSERTs (and possibly SELECTs) will actually operate on a rather small number of partitions (hardly more than 2). My main TABLE is like the following one: CREATE TABLE events ( eventtype text not null, item_id int8 not null, event_date timestamp not null default now(), row_date timestamp not null default now(), event_id serial8 primary key ); where the partitioning would happen over the values of the event_date column. The row_date columns is to record the row creation TIMESTAMP as events can be created relatively to the past, the future or the current time. In my mind a solution which is simple to maintain is to add a simple RULE ... ON INSERT for every newly created partition table. The TRIGGER approach, in my opinion, is much more complex to maintain as either the body of the function needs to be rewritten as new partitions are added, or some external TABLE lookup is needed to choose the actual table name to be used for a (dynamically created) INSERT. Now the questions. 1. As the number of RULEs will grow with the time, how will change the efficiency of the query planner while "browsing" among the RULES? 2. In the case the previous answer would lead to bad news for me, is there any better/different approach to partitioning with TRIGGERs? 3. Is there any more general advise for such approaches? -- Vincenzo Romano NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning : Having child tables in multiple database servers
Hi, Ransika de Silva writes: > The client wants to have the freedom of increasing the processor power > AND the storage by introducing new database servers. I think the following document will be of interest: http://wiki.postgresql.org/wiki/Image:Moskva_DB_Tools.v3.pdf It presents the architecture Skype is using, and how to achieve what you need with plproxy and londiste and their other tools. -- dim -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table Partitioning : Having child tables in multiple database servers
Hello, Many thanks for the quick response. Why I raised the previous question was to get an answer/solution for the following requirement; The client wants to have the freedom of increasing the processor power AND the storage by introducing new database servers. What is your thought on getting the above mentioned requirement satisfied. Thanking you and anticipating a response soon. @ Grzegorz Jaśkiewicz Apologies for directly sending you the mail. Regards Ransika 2009/7/7 Grzegorz Jaśkiewicz > you should ask the question on the list (press reply-all, not just > reply to my previous message please). > > > 2009/7/7 Ransika de Silva : > > Hello, > > Many thanks for the quick response. > > Why I raised the previous question was to get an answer/solution for the > > following requirement; > > The client wants to have the freedom of increasing the processor power > AND > > the storage by introducing new database servers. > > What is your thought on getting the above mentioned > requirement satisfied. > > Thanking you and anticipating a response soon. > > > Regards, > > Ransika > > > 2009/7/7 Grzegorz Jaśkiewicz > >> > >> On Tue, Jul 7, 2009 at 3:32 PM, Ransika de Silva > wrote: > >> > > >> > Hi all, > >> > We have got the Table Partitioning of PostgreSQL to work on one > database > >> > server, where the Parent Table and Inherited Child Tables are on one > (1) > >> > database server. > >> > The question which we need to get answered is, whether we can have the > >> > Child > >> > Tables in separate database servers? > >> > Can you please give us your expertise thoughts on this? > >> > >> no you can't > >> > >> > >> > >> > >> -- > >> GJ > > > > > > > > -- > > Ransika De Silva > > SCMAD 1.0, SCJP 1.4, > > BSc.(Hons) Information Systems > > > > > > -- > GJ > -- Ransika De Silva SCMAD 1.0, SCJP 1.4, BSc.(Hons) Information Systems
[GENERAL] Table Partitioning : Having child tables in multiple database servers
Hi all, We have got the Table Partitioning of PostgreSQL to work on one database server, where the Parent Table and Inherited Child Tables are on one (1) database server. The question which we need to get answered is, whether we can have the Child Tables in separate database servers? Can you please give us your expertise thoughts on this? Thanks & Regards -- Ransika De Silva
Re: [GENERAL] Table partitioning
=?ISO-8859-1?Q?Robert_Gravsj=F6?= writes: > Is there a reason to avoid populating the master table other that the > cost of the data always being included when querying? You can do it if you want ... I think most people think it complicates data management, but there's nothing in the system design that prevents it. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Table partitioning
While reading through the docs on Partitioning, http://www.postgresql.org/docs/current/static/ddl-partitioning.html, I got to wonder about the example given. The text in 5.9.2 states, item 1 in the first section, that the master table will contain no data. This is later confirmed by the example code where the trigger function, measurement_insert_trigger, redirects inserts to child tables and, failing to do so, raises an exception. Is there a reason to avoid populating the master table other that the cost of the data always being included when querying? My concern is that I would like to have the ELSE clause in measurement_insert_trigger just raise a log message and then return NEW. Am I missing some huge drawback? Regards, roppert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Table partitioning and query plans
Hi all! I have set up a couple of log tables in our database to use table partitioning. So far it looks to be working pretty well, but I ran into a query plan issue that doesn't make sense to me at all. We have a table "omslog", and a set of partition tables "omslog_part_", where the "#..." stuff is a serial number. There is a rule that redirects inserts into omslog, to the most recent partition table. Pretty simple, and it works as advertised. If I select a min() or max() aggregate against an indexed field in omslog, where as before it would use the index, now it does not. However, the index will be used if I execute the same aggregate against one of the partition tables. The difference is demonstrated in the explains below. Doing a sequential scan on the base table makes absolutely no sense whatsoever by any stretch of the imagination; as you can see, there are about 9 million rows there, and the planner knows it (frequent analyze). What would cause the planner to be so silly in this instance? This is on PG 8.1.4. Would this happen to be something that an upgrade might fix? Thanks for any insight, -Glen database=# explain select min(sub_system) from omslog_part_0002; Result (cost=0.19..0.20 rows=1 width=0) InitPlan -> Limit (cost=0.00..0.19 rows=1 width=14) -> Index Scan using idx_omslog_part_0002_subsys on omslog_part_0002 (cost=0.00..24212.57 rows=129781 width=14) Filter: ((sub_system)::text IS NOT NULL) database=# explain select min(sub_system) from omslog; Aggregate (cost=539575.39..539575.40 rows=1 width=14) -> Append (cost=0.00..516884.11 rows=9076511 width=14) -> Seq Scan on omslog (cost=0.00..509342.09 rows=8946709 width=14) -> Seq Scan on omslog_part_0002 omslog (cost=0.00..7542.02 rows=129802 width=14) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table partitioning based on multiple criterias possible?
Ow Mun Heng wrote: On Wed, 2007-09-12 at 09:42 +0100, Richard Huxton wrote: Well, I suppose you could partition over (month,product) but you probably don't want to. Sounds fiddly to manage. Yeah.. well current schema w/ everything in 1 table and the way things are ordered, normalised design etc, is causing PG to calc stats wrongly and i get a 1 to 500x ratio between plan and actual analysed rows which is causing queries to run > 2 hours w/o returning results. (this was tested w/ on tables w/ only ~3 million rows and joining another table w/ another 3 million data - one to many relationship, I think, is causing the mis-estimates.) Well, let's see the query + explain. It's not in one of your previous posts, is it? PG doesn't know about cross-column correlations, but you can usually get better than 500x out. Since it's taking > 2 hours to get a result we'll have to settle for explain rather than explain-analyse. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Table partitioning based on multiple criterias possible?
On Wed, 2007-09-12 at 09:42 +0100, Richard Huxton wrote: > Ow Mun Heng wrote: > > New to Pg and wondering the extent of PG's table partitioning > > capability. > > > > I have a huge table > 18 million rows(growth rate ~8 million a week) > > which I like to implement partitioning. > > OK > > > Initially the plan is to just partition it by date. eg: 1 partition per > > month. > > Fair enough - do you do a lot of searching by date? Yeah.. the products we usually query on are by dates. default is 1 to 2 weeks period. So, there will be overlaps eg: 3rd week of April to 1 week of May.. which will hit 2 partitions > > > Now, I'm thinking if it's possible to also partition them by > > product/items. > > > > is it possible to do multilevel partitioning sort of like > > > > if month = July then go to July Parition > > if Product = Shampoo then go to shampoo partition > > elif Product = Soap then go to soap partition > > Well, I suppose you could partition over (month,product) but you > probably don't want to. Sounds fiddly to manage. Yeah.. well current schema w/ everything in 1 table and the way things are ordered, normalised design etc, is causing PG to calc stats wrongly and i get a 1 to 500x ratio between plan and actual analysed rows which is causing queries to run > 2 hours w/o returning results. (this was tested w/ on tables w/ only ~3 million rows and joining another table w/ another 3 million data - one to many relationship, I think, is causing the mis-estimates.) > > Have you verified that performance will be a problem without this? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Table partitioning based on multiple criterias possible?
Ow Mun Heng wrote: New to Pg and wondering the extent of PG's table partitioning capability. I have a huge table > 18 million rows(growth rate ~8 million a week) which I like to implement partitioning. OK Initially the plan is to just partition it by date. eg: 1 partition per month. Fair enough - do you do a lot of searching by date? Now, I'm thinking if it's possible to also partition them by product/items. is it possible to do multilevel partitioning sort of like if month = July then go to July Parition if Product = Shampoo then go to shampoo partition elif Product = Soap then go to soap partition Well, I suppose you could partition over (month,product) but you probably don't want to. Sounds fiddly to manage. Have you verified that performance will be a problem without this? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Table partitioning based on multiple criterias possible?
New to Pg and wondering the extent of PG's table partitioning capability. I have a huge table > 18 million rows(growth rate ~8 million a week) which I like to implement partitioning. Initially the plan is to just partition it by date. eg: 1 partition per month. Now, I'm thinking if it's possible to also partition them by product/items. is it possible to do multilevel partitioning sort of like if month = July then go to July Parition if Product = Shampoo then go to shampoo partition elif Product = Soap then go to soap partition something like that.. Thanks ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] table partitioning pl/pgsql helpers
Hi Robert, Il giorno 01/giu/07, alle ore 04:08, Robert Treat ha scritto: [...] We I set these up for our clients, I typically seperate the partition creation piece from the data insertion piece. (Mostly as partition creation, especially with rules, is a table locking event, which is better done in a non-critical path). If you really must do it all in one go, you'll have a I see, and now I agree with you it's better to decouple partition creation and data insertion. much better chance of accomplishing this using all triggers (and triggers are better anyway)... i think you could do it with a combination of rules and a trigger (on insert to parent, create new parition and insert into it and delete from parent) but it would certainly need testing to make sure you dont have multi-rule evaluation... course since your making a trigger anyway... Even more importantly, I just discovered (trying and then reading pgsql docs) that the rule system is completely bypassed by the COPY FROM statement, so I think I'll rewrite everything using some sort of trigger-generating procedure because I want this stuff to work transparently (and we do lots of copy from). Thanks for the advices, Enrico Sirola [EMAIL PROTECTED]
Re: [GENERAL] table partitioning pl/pgsql helpers
On Wednesday 30 May 2007 12:55, Enrico Sirola wrote: > Hello, > > I'm trying to write a trigger on insert which should insert the row > in another > table. The table on which to insert the row should be selected at > runtime and > it is not know in advance. For example, let's say we have a table with > two columns, a date and an integer. a row is inserted into table XXX and > > CREATE TABLE XXX > ( > refdate date; > x2 integer; > ) > > when the statement > > insert into XXX VALUES ('2007-11-11', 1); > > is executed, a trigger (or rule) should be fired to insert the row into > table XXX_2007 (having the same schema). If the XXX_* tables are > created beforehand > this is not a problem because you can set up a rule for each date > performing > the needed insert (as documented in http://www.postgresql.org/docs/ > 8.1/interactive/ddl-partitioning.html) > > The problem arises when you try to extend the trigger in order to > also dinamically > perform table creation is the XXX_2007 doesn't exist: > > I have, for example: > > > CREATE OR REPLACE FUNCTION add_child_table(parent_name varchar, d date) > RETURNS varchar AS $$ > DECLARE > new_table_name varchar; > BEGIN >raise notice '%', 'creating table'; > new_table_name := date2tblname(parent_name, d); -- converts > table name and date into child table name > execute 'CREATE TABLE ' || new_table_name || ' ( ) INHERITS > (' || parent_name || ')'; > execute 'CREATE RULE ' > > || new_table_name > || '_insert AS ON INSERT TO ' > || parent_name > || ' WHERE ( refdate = DATE ' /* refdate is the > > field we use to partition */ > > || || d || > || ' ) DO INSTEAD INSERT INTO ' > || new_table_name > || ' VALUES ( NEW.* )'; > > return new_table_name; > END; > $$ LANGUAGE plpgsql; > > I can use the above to add a child table and the rule to implement > partitioning > > CREATE OR REPLACE FUNCTION child_creation_trigger() > RETURNS "trigger" AS $$ > BEGIN > IF ( child_exist(TG_RELNAME::text, NEW.refdate) = false ) THEN >raise notice '%', 'creating ' || TG_RELNAME::text || ' > for ' || NEW.refdate::text; >perform add_child_table(TG_RELNAME::text, NEW.refdate); >--insert into ' || child_table_name || ' values > ( NEW.* ); >RETURN NEW; > ELSE >raise notice '%', 'NOT creating ' || TG_RELNAME::text > > || ' for ' || NEW.refdate::text; > >RETURN NEW; > END IF; > END; > $$ LANGUAGE plpgsql; > > The trigger function is hooked to the to-be-partitioned table > > > CREATE TRIGGER XXX_trigger BEFORE INSERT > ON XXX FOR EACH ROW > execute procedure child_creation_trigger (); > > > now, every time I insert a tuple into XXX, the trigger is fired and > checks if the needed table exists or > not. If not, it creates the table and rule and goes on. The problem > is that in this case the first > row is inserted into the XXX table, not in the (just created) > XXX_ . Then I tried to insert > the row myself from the trigger body (and return null in order to > skip the original insertion), but > I'm not able to do it (see the commented insert in the above IF > clause) because I can't properly > quote the target table name. > I shoud perform a > > insert into child_table_name values (NEW.*); > > obviously written like this the plpgsql complains at runtime because > child_table_name is not a table name. > If, on the other side, I dynamically create the query like in > > execute 'insert into ' || child_table_name || ' values (NEW.*)'; > > it complains because NEW in the execution context is unknown. > > This should be a rather common problem... Isn't it? Is there a > canonical way to > solve it? Maybe there's a trivial answer, but I have no plpgsql > programming > experience. > We I set these up for our clients, I typically seperate the partition creation piece from the data insertion piece. (Mostly as partition creation, especially with rules, is a table locking event, which is better done in a non-critical path). If you really must do it all in one go, you'll have a much better chance of accomplishing this using all triggers (and triggers are better anyway)... i think you could do it with a combination of rules and a trigger (on insert to parent, create new parition and insert into it and delete from parent) but it would certainly need testing to make sure you dont have multi-rule evaluation... course since your making a trigger anyway... -- Robert Treat Database Architect http://www.omniti.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] table partitioning pl/pgsql helpers
Hello, I'm trying to write a trigger on insert which should insert the row in another table. The table on which to insert the row should be selected at runtime and it is not know in advance. For example, let's say we have a table with two columns, a date and an integer. a row is inserted into table XXX and CREATE TABLE XXX ( refdate date; x2 integer; ) when the statement insert into XXX VALUES ('2007-11-11', 1); is executed, a trigger (or rule) should be fired to insert the row into table XXX_2007 (having the same schema). If the XXX_* tables are created beforehand this is not a problem because you can set up a rule for each date performing the needed insert (as documented in http://www.postgresql.org/docs/ 8.1/interactive/ddl-partitioning.html) The problem arises when you try to extend the trigger in order to also dinamically perform table creation is the XXX_2007 doesn't exist: I have, for example: CREATE OR REPLACE FUNCTION add_child_table(parent_name varchar, d date) RETURNS varchar AS $$ DECLARE new_table_name varchar; BEGIN raise notice '%', 'creating table'; new_table_name := date2tblname(parent_name, d); -- converts table name and date into child table name execute 'CREATE TABLE ' || new_table_name || ' ( ) INHERITS (' || parent_name || ')'; execute 'CREATE RULE ' || new_table_name || '_insert AS ON INSERT TO ' || parent_name || ' WHERE ( refdate = DATE ' /* refdate is the field we use to partition */ || || d || || ' ) DO INSTEAD INSERT INTO ' || new_table_name || ' VALUES ( NEW.* )'; return new_table_name; END; $$ LANGUAGE plpgsql; I can use the above to add a child table and the rule to implement partitioning CREATE OR REPLACE FUNCTION child_creation_trigger() RETURNS "trigger" AS $$ BEGIN IF ( child_exist(TG_RELNAME::text, NEW.refdate) = false ) THEN raise notice '%', 'creating ' || TG_RELNAME::text || ' for ' || NEW.refdate::text; perform add_child_table(TG_RELNAME::text, NEW.refdate); --insert into ' || child_table_name || ' values ( NEW.* ); RETURN NEW; ELSE raise notice '%', 'NOT creating ' || TG_RELNAME::text || ' for ' || NEW.refdate::text; RETURN NEW; END IF; END; $$ LANGUAGE plpgsql; The trigger function is hooked to the to-be-partitioned table CREATE TRIGGER XXX_trigger BEFORE INSERT ON XXX FOR EACH ROW execute procedure child_creation_trigger (); now, every time I insert a tuple into XXX, the trigger is fired and checks if the needed table exists or not. If not, it creates the table and rule and goes on. The problem is that in this case the first row is inserted into the XXX table, not in the (just created) XXX_ . Then I tried to insert the row myself from the trigger body (and return null in order to skip the original insertion), but I'm not able to do it (see the commented insert in the above IF clause) because I can't properly quote the target table name. I shoud perform a insert into child_table_name values (NEW.*); obviously written like this the plpgsql complains at runtime because child_table_name is not a table name. If, on the other side, I dynamically create the query like in execute 'insert into ' || child_table_name || ' values (NEW.*)'; it complains because NEW in the execution context is unknown. This should be a rather common problem... Isn't it? Is there a canonical way to solve it? Maybe there's a trivial answer, but I have no plpgsql programming experience. Thanks a lot in advance, Enrico Sirola [EMAIL PROTECTED]
[GENERAL] table partitioning and plpgsql functions in 8.2.3
Overview: plpgsql functions seem to ignore partitioning, even with constraint_exclusion on. Description: Version is 8.2.3 on RHEL 4, constraint_exlusion is on. I have an events table (fw_events) partitioned by an int and a date (fw_id, fw_date for discussion) following the recommendations outlined in 5.9 of the manual. FWIW, each partition holds around 1M rows. There are presently about 250 partitions (2 ids, around 4+ months of dates). explain select count(*) from fw_events where fw_id = 1 and fw_date = '2007-04-08' shows that the single partition table is examined, and results are snappy when executed. I created a function to do the same count, and it took orders of magnitude longer. I then created a plpgsql function to return the explain plan instead, which seemed to indicate the plpgsql function scans all the partitions. Shouldn't the plpgsql function honor the partitioning and only examine the single partition in the above example? Thanks in advance, Paul ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] table partitioning and plpgsql functions in 8.2.3
Apologies, I should have read more cafeully - this is already documented: 5.9.5 Caveats Constraint exclusion only works when the query's WHERE clause contains constants. A parameterized query will not be optimized, since the planner cannot know what partitions the parameter value might select at run time. For the same reason, "stable" functions such as CURRENT_DATE must be avoided. -Original Message- From: paul rivers [mailto:[EMAIL PROTECTED] Sent: Sunday, April 08, 2007 2:40 PM To: 'pgsql general' Subject: table partitioning and plpgsql functions in 8.2.3 Overview: plpgsql functions seem to ignore partitioning, even with constraint_exclusion on. Description: Version is 8.2.3 on RHEL 4, constraint_exlusion is on. I have an events table (fw_events) partitioned by an int and a date (fw_id, fw_date for discussion) following the recommendations outlined in 5.9 of the manual. FWIW, each partition holds around 1M rows. There are presently about 250 partitions (2 ids, around 4+ months of dates). explain select count(*) from fw_events where fw_id = 1 and fw_date = '2007-04-08' shows that the single partition table is examined, and results are snappy when executed. I created a function to do the same count, and it took orders of magnitude longer. I then created a plpgsql function to return the explain plan instead, which seemed to indicate the plpgsql function scans all the partitions. Shouldn't the plpgsql function honor the partitioning and only examine the single partition in the above example? Thanks in advance, Paul ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] table partitioning
Hi Gene, I'm using a cron job which creates a new partition table every night at midnight. After I create the table, I update the rule on the parent table so that new records go into the new partition table. Then I update the constraints on the tables so that constraint exclusion works. I'm doing it this way because you'll find if you have multiple rules on the parent table it might not return the proper result for the number of affected rows (might not be an issue for you). If you don't need to do it like this, just create a script that creates all the tables ahead of time. for year-month shouldn't be too many. Thanks for the info. I think I'll have to use a similar method - it's for a web-app with quite non-static data. -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] table partitioning
Chris,I'm using a cron job which creates a new partition table every night at midnight. After I create the table, I update the rule on the parent table so that new records go into the new partition table. Then I update the constraints on the tables so that constraint exclusion works. I'm doing it this way because you'll find if you have multiple rules on the parent table it might not return the proper result for the number of affected rows (might not be an issue for you). If you don't need to do it like this, just create a script that creates all the tables ahead of time. for year-month shouldn't be too many. GeneOn 8/22/06, Chris <[EMAIL PROTECTED]> wrote: Hi all,I'm checking out table partitioning and wondering how to handle itdynamically.For example, if I want to partition by year & month, do I need to createall of the partitions (& rules) before hand or can I handle all of that another way (a pl/pgsql function? something else?)Any pointers would be most handy, thanks!--Postgresql & php tutorialshttp://www.designmagick.com/ ---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster-- Eugene Hart
[GENERAL] table partitioning
Hi all, I'm checking out table partitioning and wondering how to handle it dynamically. For example, if I want to partition by year & month, do I need to create all of the partitions (& rules) before hand or can I handle all of that another way (a pl/pgsql function? something else?) Any pointers would be most handy, thanks! -- Postgresql & php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] table partitioning
A couple of days ago I announced that I wrote a JDBC driver that adds table partitioning features to databases accessed via JDBC. I also wrote: > In case you think this could be of any interest if integrated > in Postgresql (I mean if it was a core functionality of Postgresql, > not just a JDBC driver) let me know. But nobody seemed to care... Wouldn't this be an interesting feature of Postgresql? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] Table partitioning for maximum speed?
> "JL" == Jean-Luc Lachance <[EMAIL PROTECTED]> writes: JL> BULL. JL> How many times does PG have to scan the whole table because of MVCC? JL> At least with partitioning there is a fighting chance that that won't be JL> necessary. Huh? His specific query was "WHERE md5 = ''". Why on earth would that force a sequence scan if it were an indexed column? Heck, the btree index should rule out 15/16ths of the rows after the first character comparison. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Table partitioning for maximum speed?
Is this partitioning like the schemas mentioned here: http://www.postgresql.org/docs/current/static/ddl-schemas.html? Would those help and increase performance? /B - Original Message - From: "Jean-Luc Lachance" <[EMAIL PROTECTED]> To: "Vivek Khera" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, October 10, 2003 14:23 Subject: Re: [GENERAL] Table partitioning for maximum speed? > BULL. > > How many times does PG have to scan the whole table because of MVCC? > At least with partitioning there is a fighting chance that that won't be > necessary. > Queries that involve the field on which the table is partitioned execute > faster by an order of magnitude. > It also helps with vaccuming as PG can vaccum only one partition at a > time. > I have 17M row table where all records get frequently updated over a > year. > I would do my own partitioning with inheritance if it was not broken. > Partitioning would be a BIG plus in my book. So would visibility of > records but that is another fight. > > JLL > > Vivek Khera wrote: > > > > >>>>> "JB" == Jeff Boes <[EMAIL PROTECTED]> writes: > > > > JB> Will a query against a table of 0.5 million rows beat a query against > > JB> a table of 7 million rows by a margin that makes it worth the hassle > > JB> of supporting 15 "extra" tables? > > > > I think you'll be better off with a single table, as you won't have > > contention for the index pages in the cache. > > > > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid > > index bloat, which will make them not fit in cache. Just check the > > size of your index in the pg_class table, and when it gets big, > > reindex (assuming you do lots of updates/inserts to the table). > > > > Your table splitting solution sounds like something I'd do if I were > > forced to use mysql ;-) > > > > -- > > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Vivek Khera, Ph.D.Khera Communications, Inc. > > Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 > > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Table partitioning for maximum speed?
Jean-Luc Lachance wrote: > > BULL. > > How many times does PG have to scan the whole table because of MVCC? > At least with partitioning there is a fighting chance that that won't be > necessary. > Queries that involve the field on which the table is partitioned execute > faster by an order of magnitude. > It also helps with vaccuming as PG can vaccum only one partition at a > time. > I have 17M row table where all records get frequently updated over a > year. > I would do my own partitioning with inheritance if it was not broken. > Partitioning would be a BIG plus in my book. So would visibility of > records but that is another fight. I meant to say visibility of record in the index. > > JLL > > Vivek Khera wrote: > > > > > "JB" == Jeff Boes <[EMAIL PROTECTED]> writes: > > > > JB> Will a query against a table of 0.5 million rows beat a query against > > JB> a table of 7 million rows by a margin that makes it worth the hassle > > JB> of supporting 15 "extra" tables? > > > > I think you'll be better off with a single table, as you won't have > > contention for the index pages in the cache. > > > > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid > > index bloat, which will make them not fit in cache. Just check the > > size of your index in the pg_class table, and when it gets big, > > reindex (assuming you do lots of updates/inserts to the table). > > > > Your table splitting solution sounds like something I'd do if I were > > forced to use mysql ;-) > > > > -- > > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Vivek Khera, Ph.D.Khera Communications, Inc. > > Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 > > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Table partitioning for maximum speed?
BULL. How many times does PG have to scan the whole table because of MVCC? At least with partitioning there is a fighting chance that that won't be necessary. Queries that involve the field on which the table is partitioned execute faster by an order of magnitude. It also helps with vaccuming as PG can vaccum only one partition at a time. I have 17M row table where all records get frequently updated over a year. I would do my own partitioning with inheritance if it was not broken. Partitioning would be a BIG plus in my book. So would visibility of records but that is another fight. JLL Vivek Khera wrote: > > > "JB" == Jeff Boes <[EMAIL PROTECTED]> writes: > > JB> Will a query against a table of 0.5 million rows beat a query against > JB> a table of 7 million rows by a margin that makes it worth the hassle > JB> of supporting 15 "extra" tables? > > I think you'll be better off with a single table, as you won't have > contention for the index pages in the cache. > > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid > index bloat, which will make them not fit in cache. Just check the > size of your index in the pg_class table, and when it gets big, > reindex (assuming you do lots of updates/inserts to the table). > > Your table splitting solution sounds like something I'd do if I were > forced to use mysql ;-) > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Vivek Khera, Ph.D.Khera Communications, Inc. > Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Table partitioning for maximum speed?
Jeff Boes wrote: Obviously, this is conceptually similar to what the index on the "md5" column is supposed to do for us. However, partitioning moves just a little of the processing load off the database server and onto the machine running the application. That's important, because we can afford more application machines as load increases, but we can't as easily upgrade the database server. Will a query against a table of 0.5 million rows beat a query against a table of 7 million rows by a margin that makes it worth the hassle of supporting 15 "extra" tables? I don't think 16 tables on the same server will help, but if you already have your app tier physically separate from the database tier, you could partition your data to more than one database server based on the first byte of the md5 column. I designed and built something similar a few years ago. We never got to the point where we really needed that kind of scalability, but it worked pretty well in (limited) testing. Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] Table partitioning for maximum speed?
Bruno Wolff III wrote: The data format change suggested by someone else may be worth trying as well. In addition to their suggestions, you might experiment with keeping the hash in either 4 ints or 2 bigints. If you use bigints, you could probably just use an index on one of the bigints and have only a small chance of finding more than one row that matches. This is an interesting idea. Alternatively just use bytea and store the 16 bytes directly (i.e. no hex or base64 encoding). There is b-tree index support for bytea. Joe ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Table partitioning for maximum speed?
> "JB" == Jeff Boes <[EMAIL PROTECTED]> writes: JB> Will a query against a table of 0.5 million rows beat a query against JB> a table of 7 million rows by a margin that makes it worth the hassle JB> of supporting 15 "extra" tables? I think you'll be better off with a single table, as you won't have contention for the index pages in the cache. One thing to do is to reindex reasonably often (for PG < 7.4) to avoid index bloat, which will make them not fit in cache. Just check the size of your index in the pg_class table, and when it gets big, reindex (assuming you do lots of updates/inserts to the table). Your table splitting solution sounds like something I'd do if I were forced to use mysql ;-) -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Table partitioning for maximum speed?
Please keep discussions on the list so that others may learn from or comment on the suggested solutions. On Fri, Oct 10, 2003 at 11:27:50 -0400, Jeff Boes <[EMAIL PROTECTED]> wrote: > Bruno Wolff III wrote: > > >On Thu, Oct 09, 2003 at 18:37:19 +, > > Jeff Boes <[EMAIL PROTECTED]> wrote: > > > > > >> > >>The idea bandied about now is to partition this table into 16 (or 256, > >>or ...) chunks by first digit (or 2, or ...). In the simplest case, this > >>would mean: > >> > >> > > > >If there is an index on the checksum column, then you shouldn't get > >much of a speed up by partitioning the data. > >If you don't have an index on the checksum, it sounds like you should. > > > > > Yes, the table has: > >Table "public.link_checksums" > Column | Type | Modifiers > -+---+--- > md5 | character(32) | not null > link_id | integer | not null > Indexes: ix_link_checksums_pk primary key btree (md5) In that event I would expect that you might only save a few disk accesses by having a btree with fewer levels. If the query is slow, it might be doing a sequential search because of a type mismatch. You can use explain to double check what plan is being used. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Table partitioning for maximum speed?
On Fri, Oct 10, 2003 at 13:40:17 -0400, Jeff Boes <[EMAIL PROTECTED]> wrote: > > So, again: will front-loading the work by mapping the original query to > 16 (or 256) different queries by examining the first digit save us > anything? (My colleague who came up with this idea thinks so, since the > calculation will be done on a box other than the database host, and even > one disk access saved per query would outweigh the calculation.) This could potentially save you some disk accesses. If the index is mostly in cache now, you might not get a noticable benefit. > Will having 15 (or 255) additional tables make the cache behave > differently? Is there overhead associated with having another 15 (or > 255) tables? I am not sure whether or not you would use significantly more space by having multiple tables. The data format change suggested by someone else may be worth trying as well. In addition to their suggestions, you might experiment with keeping the hash in either 4 ints or 2 bigints. If you use bigints, you could probably just use an index on one of the bigints and have only a small chance of finding more than one row that matches. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Table partitioning for maximum speed?
On Fri, Oct 10, 2003 at 04:30:20PM -, [EMAIL PROTECTED] wrote: > One way to speed things up is to convert the entire checksum. Consider > what a md5 checksum really is: a text string representing a hexadecimal > number. Storing it as TEXT or CHAR is not as good as storing it as a > number directly. Have your application convert it to a decimal number, > and then store the checksum as type NUMERIC in the database. A subsequent idea is that with NUMERIC or other variable length fields you are wasting time, space and cache hits anyway. It would be probably faster to create a custom datatype, with fixed length for the exact size of an MD5 sum. With suitable input and output functions and all the operators you need, you will likely gain some additional performance boost. IIRC, Manfred Koizar developed a fixed-width char datatype for Shridar Daitankhar (sp?) maybe a year ago. It is probably a good starting point. Look for it in the pgsql-performance archives. -- Alvaro Herrera () "Endurecerse, pero jamás perder la ternura" (E. Guevara) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Table partitioning for maximum speed?
Bruno Wolff III wrote: On Fri, Oct 10, 2003 at 11:27:50 -0400, Jeff Boes <[EMAIL PROTECTED]> wrote: Yes, the table has: Table "public.link_checksums" Column | Type | Modifiers -+---+--- md5 | character(32) | not null link_id | integer | not null Indexes: ix_link_checksums_pk primary key btree (md5) In that event I would expect that you might only save a few disk accesses by having a btree with fewer levels. If the query is slow, it might be doing a sequential search because of a type mismatch. You can use explain to double check what plan is being used. Actually, the query is *not* slow; but since we executing it a million times a day, any savings we can realize will add up in a hurry. For example, yesterday this query resulted in the following stats: 'count' => 814621, 'avg' => '0.009', 'time' => '7674.932' That is, we executed it 814,621 times, for a total (wallclock) time spent waiting of 7,674 seconds (obviously, we have multiple backends executing). So, even if we can cut this by only 0.004, that would result in a savings of almost an hour. So, again: will front-loading the work by mapping the original query to 16 (or 256) different queries by examining the first digit save us anything? (My colleague who came up with this idea thinks so, since the calculation will be done on a box other than the database host, and even one disk access saved per query would outweigh the calculation.) Will having 15 (or 255) additional tables make the cache behave differently? Is there overhead associated with having another 15 (or 255) tables? -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[GENERAL] Table partitioning?
I was browsing some database mailing lists and found this message available at: -- (http://www.phpbuilder.com/forum/read.php3?num=2&id=139678&thread=139671) I don't know MSSQL, but if it tries to compete with Oracle, it should have this funcitonality (which oracle does): divide tables into smaller tables that are searched on queries based on a set field/rule. An example of this would be: given the first character of a text column, insert the row into the table partition that has other rows with the same letter. When a query is done where mycol = 'axxs'; the first thing Oracle does is decide which table partition to use and then does the rest of the where statement. You should be able to design a database application without ever caring how many rows will be in each table...as I said, dealing with that is a DBA issue. -- It seems that, depending on how data is actually stored on disk and accessed, this method of partitioning a table internally in the database could be a benefit. Has anyone investigated this for PostgreSQL? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html