Re: [GENERAL] Table partitioning

2013-10-28 Thread Herouth Maoz
Everything is slow about it - selects, deletes and inserts, that is. I don't do 
updates on that table.

The inserts and deletes are less of an issue because they are done once a week. 
Of course it would be nicer if they were faster, but that's less of an issue. 
The real issue is with self-joins, which are a common query. But I have indexes 
on the relevant fields: the connecting field (the one used for the self-join) 
as well as the date field. The queries are mostly of the format SELECT ... FROM 
tablename t1 left join tablename t2 on t1.indexed_field = t2.indexed_field and 
t2.boolean_field where t1.date_field >= 'date1' and t1.date_field < 'date2' and 
not t1.boolean; This can take about 15 minutes to run, depending on the date 
range. And it doesn't matter much if I put a date range within the left join, 
either.

As I mentioned in another message, I don't like the idea that my script may 
fail, and then inserts into the table might start to fail as well. I'm not 
always available to run it manually within a set time. And writing an 
automation that is different than all the other tables I maintain in that 
database makes for maintenance spaghetti. I also don't like running automated 
DDL commands. They don't play well with backups.


-הודעה מקורית-
מאת: Steve Crawford [mailto:scrawf...@pinpointresearch.com]
נשלח: ב 28/10/2013 22:31
אל: Herouth Maoz; pgsql-general@postgresql.org
נושא: Re: [GENERAL] Table partitioning
 
On 10/28/2013 09:27 AM, Herouth Maoz wrote:
> I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
> partitioning it by months, but I don't like the idea of creating and dropping 
> tables all the time.
What is slow about it? Inserting? Selecting? Deleting? Partitioning can 
assist with some issues but does no good if what you really need is an 
index or better query. Partitioning shines as an option to manage 
archiving/purging of time-series data but only if you work with it, not 
against it.

What don't you like about creating and dropping tables? You can easily 
automate it: https://github.com/keithf4/pg_partman

>
> I'm thinking of simply creating 12 child tables, in which the check condition 
> will be, for example, date_part('month'', time_arrived) = 1 (or 2 for 
> February, 3 for March etc.).
>
> I'll just be deleting records rather than dropping tables, the same way I do 
> in my current setup. I delete a week's worth every time.
You are missing out on one of the best aspects of partitioning. Compared 
to dropping or truncating a child table, deleting is far slower and 
causes table bloat which may impact future queries.
>
> Second, when I delete (not drop!) from the mother table, are records deleted 
> automatically from the child tables or do I need to create rules/triggers for 
> that?
>

Yes unless you use the keyword "ONLY": "If specified, deletes rows from 
the named table only. When not specified, any tables inheriting from the 
named table are also processed."

Cheers,
Steve




Re: [GENERAL] Table partitioning

2013-10-28 Thread Steve Crawford

On 10/28/2013 09:27 AM, Herouth Maoz wrote:

I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
partitioning it by months, but I don't like the idea of creating and dropping 
tables all the time.
What is slow about it? Inserting? Selecting? Deleting? Partitioning can 
assist with some issues but does no good if what you really need is an 
index or better query. Partitioning shines as an option to manage 
archiving/purging of time-series data but only if you work with it, not 
against it.


What don't you like about creating and dropping tables? You can easily 
automate it: https://github.com/keithf4/pg_partman




I'm thinking of simply creating 12 child tables, in which the check condition 
will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 
3 for March etc.).

I'll just be deleting records rather than dropping tables, the same way I do in 
my current setup. I delete a week's worth every time.
You are missing out on one of the best aspects of partitioning. Compared 
to dropping or truncating a child table, deleting is far slower and 
causes table bloat which may impact future queries.


Second, when I delete (not drop!) from the mother table, are records deleted 
automatically from the child tables or do I need to create rules/triggers for 
that?



Yes unless you use the keyword "ONLY": "If specified, deletes rows from 
the named table only. When not specified, any tables inheriting from the 
named table are also processed."


Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table partitioning

2013-10-28 Thread Elliot

On 2013-10-28 12:47, Herouth Maoz wrote:

Thanks. Assuming there is an index on the time_arrived column, and that there 
are about 10.5 million records in each child table, how bad will performance be 
if the query actually accesses all the 12 tables? Will it be as bad as using 
the full table?


In the case where you don't add a predicate to your select statement 
that matches your constraint checks, meaning your statements are 
visiting every child table, you're not going to experience any 
performance gains over leaving everything in one table (plus you're 
adding on extra maintenance overhead and likely degrading query 
performance, even if only by a small amount).


What are your concerns with creating/dropping tables? You may have even 
better success with creating a child table for each week. Fifty-two 
extra tables is extra maintenance but the performance boost you could 
get by dividing out your data is probably significant, and that kind of 
maintenance should be automated anyway.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table partitioning

2013-10-28 Thread Herouth Maoz
Thanks. Assuming there is an index on the time_arrived column, and that there 
are about 10.5 million records in each child table, how bad will performance be 
if the query actually accesses all the 12 tables? Will it be as bad as using 
the full table?

On 28/10/2013, at 18:31, Elliot wrote:

> On 2013-10-28 12:27, Herouth Maoz wrote:
>> I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
>> partitioning it by months, but I don't like the idea of creating and 
>> dropping tables all the time.
>> 
>> I'm thinking of simply creating 12 child tables, in which the check 
>> condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 
>> for February, 3 for March etc.).
>> 
>> I'll just be deleting records rather than dropping tables, the same way I do 
>> in my current setup. I delete a week's worth every time.
>> 
>> So, I have two questions.
>> 
>> First, is constraint exclusion going to work with that kind of condition? I 
>> mean, if my WHERE clause says something like "time_arrived >= '2013-04-05' 
>> and time_arrived < '2013-04-17'", will it be able to tell that 
>> date_part("month",time_arrived) for all the records is 4, and therefore 
>> avoid selecting from any partitions other than the april one?
>> 
>> Second, when I delete (not drop!) from the mother table, are records deleted 
>> automatically from the child tables or do I need to create rules/triggers 
>> for that?
>> 
>> 
>> TIA,
>> Herouth
>> 
> 1. No - you'd need a condition like "where date_part("month", time_arrived) = 
> 1" in your select statements in order for the constraint exclusion to kick in
> 2. Yes - there is no need to create rules or triggers for deletes on the 
> parent table (check out the syntax for "delete from " versus "delete 
> from only )
> 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table partitioning

2013-10-28 Thread Elliot

On 2013-10-28 12:27, Herouth Maoz wrote:

I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
partitioning it by months, but I don't like the idea of creating and dropping 
tables all the time.

I'm thinking of simply creating 12 child tables, in which the check condition 
will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 
3 for March etc.).

I'll just be deleting records rather than dropping tables, the same way I do in 
my current setup. I delete a week's worth every time.

So, I have two questions.

First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says 
something like "time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be 
able to tell that date_part("month",time_arrived) for all the records is 4, and therefore avoid 
selecting from any partitions other than the april one?

Second, when I delete (not drop!) from the mother table, are records deleted 
automatically from the child tables or do I need to create rules/triggers for 
that?


TIA,
Herouth

1. No - you'd need a condition like "where date_part("month", 
time_arrived) = 1" in your select statements in order for the constraint 
exclusion to kick in
2. Yes - there is no need to create rules or triggers for deletes on the 
parent table (check out the syntax for "delete from " versus 
"delete from only )




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Table partitioning

2013-10-28 Thread Herouth Maoz
I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
partitioning it by months, but I don't like the idea of creating and dropping 
tables all the time.

I'm thinking of simply creating 12 child tables, in which the check condition 
will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 
3 for March etc.).

I'll just be deleting records rather than dropping tables, the same way I do in 
my current setup. I delete a week's worth every time.

So, I have two questions.

First, is constraint exclusion going to work with that kind of condition? I 
mean, if my WHERE clause says something like "time_arrived >= '2013-04-05' and 
time_arrived < '2013-04-17'", will it be able to tell that 
date_part("month",time_arrived) for all the records is 4, and therefore avoid 
selecting from any partitions other than the april one?

Second, when I delete (not drop!) from the mother table, are records deleted 
automatically from the child tables or do I need to create rules/triggers for 
that?


TIA,
Herouth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning

2013-05-22 Thread Richard Onorato
So I worked around most of my errors.  I removed the bigserial and used two of 
the columns as the primary key.  I am now getting the following hibernate 
exception back:
Batch update returned unexpected row count from update [0]; actual row count: 
0; expected: 1


This appears to be caused by the fact that the function is not returning back 
the row count.  I did a google search and found a few suggestions on how to 
resolve this issue, but they do not seem to work well.  I tried returning NEW, 
but that seems to cause the engine to also insert the record in the base table 
as well as a partition.  Thus I end up with 120 records when I am expecting 
just 60.

Any ideas on how I can fix this issue?
 
Regards,


Richard



 From: Richard Onorato 
To: Raghavendra  
Cc: "pgsql-general@postgresql.org"  
Sent: Wednesday, May 22, 2013 7:27 PM
Subject: Re: [GENERAL] Table Partitioning
 


Raghavendra,

I am doing my inserts via Java JPA statements embedded in my Data Access Layer. 
 I can share them if you would like to see them.
 
Regards,


Richard



 From: Raghavendra 
To: Richard Onorato  
Cc: "pgsql-general@postgresql.org"  
Sent: Wednesday, May 22, 2013 2:39 AM
Subject: Re: [GENERAL] Table Partitioning
 


On Wed, May 22, 2013 at 6:54 AM, Richard Onorato  
wrote:

Were you able to get it to insert with the bigserial being used on the table?  

Yes.
 
Every time I go to do an insert into one of the inherited tables I am now 
getting the following exception:
>org.hibernate.HibernateException: The database returned no natively generated 
>identity value
>
>

Hmm, I guess you are inserting on the parent table not directly into inherited 
table.
Can you share the INSERT statement.
 
Is auto-increment supported on table partitioning?
> 

Yes, BIGSERIAL will create a sequence that will be shared by all child 
partitions.
Check below example as per your test case, INSERT statement do not have 
BIGSERIAL column still its auto-increment and populated data in child tables. 

postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values 
(9,20,30,1,now());
INSERT 0 0
postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values 
(7,20,30,1,now());
INSERT 0 0
postgres=# select * from mymappingtablet5;
 id | c1 | c2 | c3 | count |           createdtime
++++---+--
  8 |  9 | 20 | 30 |     1 | 2013-05-18 02:08:33.061548+05:30 

postgres=# select * from mymappingtablet3;
 id | c1 | c2 | c3 | count |           createdtime
++++---+--
  9 |  7 | 20 | 30 |     1 | 2013-05-18 02:12:03.076529+05:30
(1 row)

(Request not to top-post please ... :)  )

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

Re: [GENERAL] Table Partitioning

2013-05-22 Thread Richard Onorato
Raghavendra,

I am doing my inserts via Java JPA statements embedded in my Data Access Layer. 
 I can share them if you would like to see them.
 
Regards,


Richard



 From: Raghavendra 
To: Richard Onorato  
Cc: "pgsql-general@postgresql.org"  
Sent: Wednesday, May 22, 2013 2:39 AM
Subject: Re: [GENERAL] Table Partitioning
 


On Wed, May 22, 2013 at 6:54 AM, Richard Onorato  
wrote:

Were you able to get it to insert with the bigserial being used on the table?  

Yes.
 
Every time I go to do an insert into one of the inherited tables I am now 
getting the following exception:
>org.hibernate.HibernateException: The database returned no natively generated 
>identity value
>
>

Hmm, I guess you are inserting on the parent table not directly into inherited 
table.
Can you share the INSERT statement.
 
Is auto-increment supported on table partitioning?
> 

Yes, BIGSERIAL will create a sequence that will be shared by all child 
partitions.
Check below example as per your test case, INSERT statement do not have 
BIGSERIAL column still its auto-increment and populated data in child tables. 

postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values 
(9,20,30,1,now());
INSERT 0 0
postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values 
(7,20,30,1,now());
INSERT 0 0
postgres=# select * from mymappingtablet5;
 id | c1 | c2 | c3 | count |           createdtime
++++---+--
  8 |  9 | 20 | 30 |     1 | 2013-05-18 02:08:33.061548+05:30 

postgres=# select * from mymappingtablet3;
 id | c1 | c2 | c3 | count |           createdtime
++++---+--
  9 |  7 | 20 | 30 |     1 | 2013-05-18 02:12:03.076529+05:30
(1 row)

(Request not to top-post please ... :)  )

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

Re: [GENERAL] Table Partitioning

2013-05-22 Thread Raghavendra
On Wed, May 22, 2013 at 6:54 AM, Richard Onorato
wrote:

> Were you able to get it to insert with the bigserial being used on the
> table?
>

Yes.


> Every time I go to do an insert into one of the inherited tables I am now
> getting the following exception:
> org.hibernate.HibernateException: The database returned no natively
> generated identity value
>
>
Hmm, I guess you are inserting on the parent table not directly into
inherited table.
Can you share the INSERT statement.


> Is auto-increment supported on table partitioning?
>
>

Yes, BIGSERIAL will create a sequence that will be shared by all child
partitions.
Check below example as per your test case, INSERT statement do not have
BIGSERIAL column still its auto-increment and populated data in child
tables.

postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values
(9,20,30,1,now());
INSERT 0 0
postgres=# insert into mymappingtable(c1,c2,c3,count,createdtime) values
(7,20,30,1,now());
INSERT 0 0
postgres=# select * from mymappingtablet5;
 id | c1 | c2 | c3 | count |   createdtime
++++---+--
  8 |  9 | 20 | 30 | 1 | 2013-05-18 02:08:33.061548+05:30

postgres=# select * from mymappingtablet3;
  id | c1 | c2 | c3 | count |   createdtime
++++---+--
  9 |  7 | 20 | 30 | 1 | 2013-05-18 02:12:03.076529+05:30
(1 row)

(Request not to top-post please ... :)  )

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/


Re: [GENERAL] Table Partitioning

2013-05-21 Thread Jeff Janes
On Tuesday, May 21, 2013, Richard Onorato wrote:

> I am wanting to partition my data based on a mod of one of the bigint
> columns, but when I run my insert test all the data goes into the base
> table and not the partitions.  Here is what the table looks like:
>


This works for me, on 9.2.4.  Can you show exactly how you do the inserts?

CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0))
> INHERITS (MyMappingTable);
>
>
This is not going to work well.  constraint_exclusion does not understand
the modulus operator.

Cheers,

Jeff


Re: [GENERAL] Table Partitioning

2013-05-21 Thread Richard Onorato
Were you able to get it to insert with the bigserial being used on the table?  
Every time I go to do an insert into one of the inherited tables I am now 
getting the following exception:

org.hibernate.HibernateException: The database returned no natively generated 
identity value


Is auto-increment supported on table partitioning?
 
Regards,


Richard



 From: Raghavendra 
To: Richard Onorato  
Cc: "pgsql-general@postgresql.org"  
Sent: Tuesday, May 21, 2013 1:06 PM
Subject: Re: [GENERAL] Table Partitioning
 


On Tue, May 21, 2013 at 11:03 PM, Richard Onorato  
wrote:

I am wanting to partition my data based on a mod of one of the bigint columns, 
but when I run my insert test all the data goes into the base table and not the 
partitions.  Here is what the table looks like:
>
>
>CREATE table MyMappingTable ( id bigserial NOT NULL,
>                                     c1 bigInt NOT NULL,
>                                     c2 bigInt NOT NULL,
>                                     c3 bigint NOT NULL,
>                                     count bigint DEFAULT 1,
>                                     createdTime timestamp with time zone 
>default CURRENT_TIMESTAMP,
>                                     CONSTRAINT MyMappingTable_index PRIMARY 
>KEY (id) )
>with (OIDS=FALSE);
>
>
>CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) INHERITS 
>(MyMappingTable);
>CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) INHERITS 
>(MyMappingTable);
>CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) INHERITS 
>(MyMappingTable);
>CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) INHERITS 
>(MyMappingTable);
>CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) INHERITS 
>(MyMappingTable);
>
>
>Here is the trigger function that I added to the database:
>
>
>CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger()
>RETURNS trigger AS $$
>BEGIN
>  IF ( (NEW.c1 % 5) = 0 ) THEN 
>    INSERT INTO MyMappingTableT1 VALUES (NEW.*); 
>  ELSIF ( (NEW.c1 % 5) = 1 ) THEN 
>    INSERT INTO MyMappingTableT2 VALUES (NEW.*);
>  ELSIF ( (NEW.c1 % 5) = 2 ) THEN 
>    INSERT INTO MyMappingTableT3 VALUES (NEW.*);
>  ELSIF ( (NEW.c1 % 5) = 3 ) THEN 
>    INSERT INTO MyMappingTableT4 VALUES (NEW.*);
>  ELSIF ( (NEW.c1 % 5) = 4 ) THEN 
>    INSERT INTO MyMappingTableT5 VALUES (NEW.*);
>  ELSE
>
>    RAISE EXCEPTION 'c1 mod out of range.  Something wrong with the 
>my_mapping_table_insert_trigger() function!';
>  END IF;
>  RETURN NULL;
>END;
>$$
>LANGUAGE plpgsql;
>
>
>Here is the Trigger that I added to the table:
>
>
>CREATE TRIGGER insert_my_mapping_table_trigger 
>  BEFORE INSERT ON MyMappingTable 
>  FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger();
>
>
>SET constraint_exclusion = ON;
>
>
>
>Regards,
>
>
>Richard

I tried your test case, its working fine from my end and populating data 
properly to partition childs. 

insert into mymappingtable values (1,7,20,30,1,now());
insert into mymappingtable values (2,6,20,30,1,now());
insert into mymappingtable values (3,8,20,30,1,now());
insert into mymappingtable values (4,9,20,30,1,now());
insert into mymappingtable values (5,10,20,30,1,now());

postgres=# \dt+ MyMappingTable*
                            List of relations
 Schema |       Name       | Type  |  Owner   |    Size    | Description
+--+---+--++-
 public | mymappingtable   | table | postgres | 0 bytes    |
 public | mymappingtablet1 | table | postgres | 8192 bytes |
 public | mymappingtablet2 | table | postgres | 8192 bytes |
 public | mymappingtablet3 | table | postgres | 8192 bytes |
 public | mymappingtablet4 | table | postgres | 8192 bytes |
 public | mymappingtablet5 | table | postgres | 8192 bytes |

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

Re: [GENERAL] Table Partitioning

2013-05-21 Thread Richard Onorato
Interesting.  I wonder what I am doing wrong.  I will try and setup the 
database again and see if I can get it to work.

thank you for testing it out for me.

Richard 

On May 21, 2013, at 1:06 PM, Raghavendra  
wrote:

> On Tue, May 21, 2013 at 11:03 PM, Richard Onorato  
> wrote:
> I am wanting to partition my data based on a mod of one of the bigint 
> columns, but when I run my insert test all the data goes into the base table 
> and not the partitions.  Here is what the table looks like:
> 
> CREATE table MyMappingTable ( id bigserial NOT NULL,
>  c1 bigInt NOT NULL,
>  c2 bigInt NOT NULL,
>  c3 bigint NOT NULL,
>  count bigint DEFAULT 1,
>  createdTime timestamp with time zone 
> default CURRENT_TIMESTAMP,
>  CONSTRAINT MyMappingTable_index PRIMARY 
> KEY (id) )
> with (OIDS=FALSE);
> 
> CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) 
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) 
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) 
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) 
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) 
> INHERITS (MyMappingTable);
> 
> Here is the trigger function that I added to the database:
> 
> CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger()
> RETURNS trigger AS $$
> BEGIN
>   IF ( (NEW.c1 % 5) = 0 ) THEN 
> INSERT INTO MyMappingTableT1 VALUES (NEW.*); 
>   ELSIF ( (NEW.c1 % 5) = 1 ) THEN 
> INSERT INTO MyMappingTableT2 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 2 ) THEN 
> INSERT INTO MyMappingTableT3 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 3 ) THEN 
> INSERT INTO MyMappingTableT4 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 4 ) THEN 
> INSERT INTO MyMappingTableT5 VALUES (NEW.*);
>   ELSE
> RAISE EXCEPTION 'c1 mod out of range.  Something wrong with the 
> my_mapping_table_insert_trigger() function!';
>   END IF;
>   RETURN NULL;
> END;
> $$
> LANGUAGE plpgsql;
> 
> Here is the Trigger that I added to the table:
> 
> CREATE TRIGGER insert_my_mapping_table_trigger 
>   BEFORE INSERT ON MyMappingTable 
>   FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger();
> 
> SET constraint_exclusion = ON;
> 
> Regards,
> 
> Richard
> 
> I tried your test case, its working fine from my end and populating data 
> properly to partition childs. 
> 
> insert into mymappingtable values (1,7,20,30,1,now());
> insert into mymappingtable values (2,6,20,30,1,now());
> insert into mymappingtable values (3,8,20,30,1,now());
> insert into mymappingtable values (4,9,20,30,1,now());
> insert into mymappingtable values (5,10,20,30,1,now());
> 
> postgres=# \dt+ MyMappingTable*
> List of relations
>  Schema |   Name   | Type  |  Owner   |Size| Description
> +--+---+--++-
>  public | mymappingtable   | table | postgres | 0 bytes|
>  public | mymappingtablet1 | table | postgres | 8192 bytes |
>  public | mymappingtablet2 | table | postgres | 8192 bytes |
>  public | mymappingtablet3 | table | postgres | 8192 bytes |
>  public | mymappingtablet4 | table | postgres | 8192 bytes |
>  public | mymappingtablet5 | table | postgres | 8192 bytes |
> 
> ---
> Regards,
> Raghavendra
> EnterpriseDB Corporation
> Blog: http://raghavt.blogspot.com/
> 



Re: [GENERAL] Table Partitioning

2013-05-21 Thread Raghavendra
On Tue, May 21, 2013 at 11:03 PM, Richard Onorato  wrote:

> I am wanting to partition my data based on a mod of one of the bigint
> columns, but when I run my insert test all the data goes into the base
> table and not the partitions.  Here is what the table looks like:
>
> CREATE table MyMappingTable ( id bigserial NOT NULL,
>  c1 bigInt NOT NULL,
>  c2 bigInt NOT NULL,
>  c3 bigint NOT NULL,
>  count bigint DEFAULT 1,
>  createdTime timestamp with time zone
> default CURRENT_TIMESTAMP,
>  CONSTRAINT MyMappingTable_index
> PRIMARY KEY (id) )
> with (OIDS=FALSE);
>
> CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0))
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1))
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2))
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3))
> INHERITS (MyMappingTable);
> CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4))
> INHERITS (MyMappingTable);
>
> Here is the trigger function that I added to the database:
>
> CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger()
> RETURNS trigger AS $$
> BEGIN
>   IF ( (NEW.c1 % 5) = 0 ) THEN
> INSERT INTO MyMappingTableT1 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 1 ) THEN
> INSERT INTO MyMappingTableT2 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 2 ) THEN
> INSERT INTO MyMappingTableT3 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 3 ) THEN
> INSERT INTO MyMappingTableT4 VALUES (NEW.*);
>   ELSIF ( (NEW.c1 % 5) = 4 ) THEN
> INSERT INTO MyMappingTableT5 VALUES (NEW.*);
>   ELSE
> RAISE EXCEPTION 'c1 mod out of range.  Something wrong with
> the my_mapping_table_insert_trigger() function!';
>   END IF;
>   RETURN NULL;
> END;
> $$
> LANGUAGE plpgsql;
>
> Here is the Trigger that I added to the table:
>
> CREATE TRIGGER insert_my_mapping_table_trigger
>   BEFORE INSERT ON MyMappingTable
>   FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger();
>
> SET constraint_exclusion = ON;
>
> Regards,
>
> Richard
>

I tried your test case, its working fine from my end and populating data
properly to partition childs.

insert into mymappingtable values (1,7,20,30,1,now());
insert into mymappingtable values (2,6,20,30,1,now());
insert into mymappingtable values (3,8,20,30,1,now());
insert into mymappingtable values (4,9,20,30,1,now());
insert into mymappingtable values (5,10,20,30,1,now());

postgres=# \dt+ MyMappingTable*
List of relations
 Schema |   Name   | Type  |  Owner   |Size| Description
+--+---+--++-
 public | mymappingtable   | table | postgres | 0 bytes|
 public | mymappingtablet1 | table | postgres | 8192 bytes |
 public | mymappingtablet2 | table | postgres | 8192 bytes |
 public | mymappingtablet3 | table | postgres | 8192 bytes |
 public | mymappingtablet4 | table | postgres | 8192 bytes |
 public | mymappingtablet5 | table | postgres | 8192 bytes |

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/


[GENERAL] Table Partitioning

2013-05-21 Thread Richard Onorato
I am wanting to partition my data based on a mod of one of the bigint columns, 
but when I run my insert test all the data goes into the base table and not the 
partitions.  Here is what the table looks like:

CREATE table MyMappingTable ( id bigserial NOT NULL,
                                     c1 bigInt NOT NULL,
                                     c2 bigInt NOT NULL,
                                     c3 bigint NOT NULL,
                                     count bigint DEFAULT 1,
                                     createdTime timestamp with time zone 
default CURRENT_TIMESTAMP,
                                     CONSTRAINT MyMappingTable_index PRIMARY 
KEY (id) )
with (OIDS=FALSE);

CREATE TABLE MyMappingTableT1 (PRIMARY KEY (id), CHECK((c1 % 5) = 0)) INHERITS 
(MyMappingTable);
CREATE TABLE MyMappingTableT2 (PRIMARY KEY (id), CHECK((c1 % 5) = 1)) INHERITS 
(MyMappingTable);
CREATE TABLE MyMappingTableT3 (PRIMARY KEY (id), CHECK((c1 % 5) = 2)) INHERITS 
(MyMappingTable);
CREATE TABLE MyMappingTableT4 (PRIMARY KEY (id), CHECK((c1 % 5) = 3)) INHERITS 
(MyMappingTable);
CREATE TABLE MyMappingTableT5 (PRIMARY KEY (id), CHECK((c1 % 5) = 4)) INHERITS 
(MyMappingTable);

Here is the trigger function that I added to the database:

CREATE OR REPLACE FUNCTION my_mapping_table_insert_trigger()
RETURNS trigger AS $$
BEGIN
  IF ( (NEW.c1 % 5) = 0 ) THEN 
    INSERT INTO MyMappingTableT1 VALUES (NEW.*); 
  ELSIF ( (NEW.c1 % 5) = 1 ) THEN 
    INSERT INTO MyMappingTableT2 VALUES (NEW.*);
  ELSIF ( (NEW.c1 % 5) = 2 ) THEN 
    INSERT INTO MyMappingTableT3 VALUES (NEW.*);
  ELSIF ( (NEW.c1 % 5) = 3 ) THEN 
    INSERT INTO MyMappingTableT4 VALUES (NEW.*);
  ELSIF ( (NEW.c1 % 5) = 4 ) THEN 
    INSERT INTO MyMappingTableT5 VALUES (NEW.*);
  ELSE

    RAISE EXCEPTION 'c1 mod out of range.  Something wrong with the 
my_mapping_table_insert_trigger() function!';
  END IF;
  RETURN NULL;
END;
$$
LANGUAGE plpgsql;

Here is the Trigger that I added to the table:

CREATE TRIGGER insert_my_mapping_table_trigger 
  BEFORE INSERT ON MyMappingTable 
  FOR EACH ROW EXECUTE PROCEDURE my_mapping_table_insert_trigger();

SET constraint_exclusion = ON;


Regards,


Richard

Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Vincenzo Romano
2009/12/18 Scott Marlowe :
> The main reason to avoid rules is that they're much less efficient
> than triggers.  We use partitioning at work for our stats db, and
> partition by day, and we have about 2 years worth of stats data, so
> our insert trigger has about 700 if / elseif / else conditions in it,
> and the trigger is still very very fast.  I tried it with rules before
> and it was way too slow.

I still don't understand the poor efficiency of the rule system. I presume it's
in the implementation, not the concept.

Well, for the sake of efficiency I would like to avoid that if/elseif chain
with an extra support table to look for the right partition.
Unless you have a different advise against dynamic SQL (EXECUTE...)
to define the partition table name.

> Note that the absolute fastest way to insert to the right child table
> is to do it from the application side, choosing the right child table
> there.

I definitely disagree on this advise. For the sake of speed I could
also run some library level (and thus application level) database
solution.
The main objective for using an RDBMS (at least in my view) is to
gather in a single place (the RDBMS) the data storage and
management stuff.
Moving policies to the application level would just scatter that
stuff with faster but more error prone solutions.
And the aplpication needs to know the partitioning policies.

> We automated our trigger creation.  While the trigger is huge in terms
> of number of lines, speed wise the creation of the new trigger each
> night at midnight is measured in milliseconds

Correct.
There is still another point for me to investigate into this partitioning
area. When the partitions get created.
I see from your answers that everyone is expecting the partitions to
be already in place at the moment the trigger or the rule get fired.
In my application you can generate events for the past or the future
where there's no partition yet.

-- 
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Scott Marlowe
On Thu, Dec 17, 2009 at 8:29 AM, Vincenzo Romano
 wrote:
> 2009/12/17 Sam Jas 
>>
>> Rule is not advisable, Trigger is the best solution.
>
> Does the trigger solution need the TABLE CHECK constraint?
> It looks to me it won't.

The table check constraint is used during selects to pick just the
child tables needed to deliver the data you are asking for, assuming
the key value is in the where clause.

The main reason to avoid rules is that they're much less efficient
than triggers.  We use partitioning at work for our stats db, and
partition by day, and we have about 2 years worth of stats data, so
our insert trigger has about 700 if / elseif / else conditions in it,
and the trigger is still very very fast.  I tried it with rules before
and it was way too slow.

Note that the absolute fastest way to insert to the right child table
is to do it from the application side, choosing the right child table
there.

We automated our trigger creation.  While the trigger is huge in terms
of number of lines, speed wise the creation of the new trigger each
night at midnight is measured in milliseconds

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Vincenzo Romano
2009/12/17 Dimitri Fontaine :
> Vincenzo Romano  writes:
>> Is there any performance study for the trigger-based implementation?
>
> Consider that if you use RULE to partition, when you DROP a partition
> the INSERTs are locked out because the query depends on the table being
> droped.
>
> That alone could lead you to stop considering RULEs for partitioning.

In that case I would also drop the relevant rule(s).
But it seems anyway that the rule system would need to scan all the
rules in order to know which one(s) to apply.
And, at least in my case, I would have hundreds of rules ...

Thanks for the hint, anyway.

-- 
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Dimitri Fontaine
Vincenzo Romano  writes:
> Is there any performance study for the trigger-based implementation?

Consider that if you use RULE to partition, when you DROP a partition
the INSERTs are locked out because the query depends on the table being
droped.

That alone could lead you to stop considering RULEs for partitioning.
-- 
dim

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Vincenzo Romano
2009/12/17 Sam Jas 
>
> Rule is not advisable, Trigger is the best solution.

Does the trigger solution need the TABLE CHECK constraint?
It looks to me it won't.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Vincenzo Romano
2009/12/17 David Fetter :
>> You really think that calling and running a trigger function for every
>> line is the best solution?
>
> Yes.  The trigger function is choosing from a small subset of the
> tables, or you know which tables exactly the rows are going into and
> insert them there.

So I understand that when a table has multiple RULEs, the planner just
browse all of them
to check the one(s) that will apply.
What I have in mind is a "side table" with some bookkeeping data to be
used by the trigger
function to select the actual table to be used.
Do you think this is faster than RULEs?

Is there any performance study for the trigger-based implementation?

> Oh, and please do trim, and don't top-post.  I've fix this in this
> post.

Sorry! :-)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread David Fetter
On Thu, Dec 17, 2009 at 02:41:40PM +0100, Vincenzo Romano wrote:
> > 2009/12/17 Sam Jas 
> > Rule is not advisable, Trigger is the best solution.

> If you have to choose among a couple hundred partition tables, the
> trigger function body is far from trivial!

It's possible to generate such code and deploy it automatically.

> You really think that calling and running a trigger function for every
> line is the best solution?

Yes.  The trigger function is choosing from a small subset of the
tables, or you know which tables exactly the rows are going into and
insert them there.

Oh, and please do trim, and don't top-post.  I've fix this in this
post.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Vincenzo Romano
Why?

If you have to choose among a couple hundred partition tables, the
trigger function body is far from trivial!
You really think that calling and running a trigger function for every
line is the best solution?

2009/12/17 Sam Jas 
>
> Rule is not advisable, Trigger is the best solution.
>
> ---
> Thanks
> Sam Jas
>
> --- On Thu, 17/12/09, Vincenzo Romano  wrote:
>
> From: Vincenzo Romano 
> Subject: [GENERAL] Table Partitioning Advice Request
> To: pgsql-general@postgresql.org
> Date: Thursday, 17 December, 2009, 11:05 AM
>
> Hi all.
>
> I'm planning to implement table partitioning as "suggested" (among
> other sources) in the official documentation.
> I'm using v8.4.2 at the moment.
>
> My case is far from the general one as:
> 1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs)
> 2. Rows will be inserted one-by-one or, in the worse case, in bunches
> of two or three
> 3. Partitioning will be based upon TIMESTAMP ranges
> 4. The "virtual" tables should approach (and possibly go past) 100M rows
> 5. Most (99%) of the INSERTs (and possibly SELECTs) will actually
> operate on a rather small number of partitions (hardly more than 2).
>
> My main TABLE is like the following one:
>
> CREATE TABLE events (
>   eventtype text not null,
>   item_id int8 not null,
>   event_date timestamp not null default now(),
>   row_date timestamp not null default now(),
>   event_id serial8 primary key
> );
>
> where the partitioning would happen over the values of the event_date column.
> The row_date columns is to record the row creation TIMESTAMP as events
> can be created relatively to the past, the future or the current time.
>
> In my mind a solution which is simple to maintain is to add a simple
> RULE ... ON INSERT for every newly created partition table.
> The TRIGGER approach, in my opinion, is much more complex to maintain
> as either the body of the function needs to be rewritten as new
> partitions are added, or
> some external TABLE lookup is needed to choose the actual table name
> to be used for a (dynamically created) INSERT.
>
> Now the questions.
>
> 1.  As the number of RULEs will grow with the time, how will change
> the efficiency of the query planner while "browsing" among the RULES?
> 2. In the case the previous answer would lead to bad news for me, is
> there any better/different approach to partitioning with TRIGGERs?
> 3. Is there any more general advise for such approaches?
>
> --
> Vincenzo Romano
> NON QVIETIS MARIBVS NAVTA PERITVS
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> 
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.


--
Vincenzo Romano
NotOrAnd Information Technologies
cel. +39 339 8083886  | gtalk. vincenzo.rom...@notorand.it
fix. +39 0823 454163  | skype. notorand.it
fax. +39 02 700506964 | msn.   notorand.it
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Sam Jas
Rule is not advisable, Trigger is the best solution. 

---
Thanks 
Sam Jas

--- On Thu, 17/12/09, Vincenzo Romano  wrote:

From: Vincenzo Romano 
Subject: [GENERAL] Table Partitioning Advice Request
To: pgsql-general@postgresql.org
Date: Thursday, 17 December, 2009, 11:05 AM

Hi all.

I'm planning to implement table partitioning as "suggested" (among
other sources) in the official documentation.
I'm using v8.4.2 at the moment.

My case is far from the general one as:
1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs)
2. Rows will be inserted one-by-one or, in the worse case, in bunches
of two or three
3. Partitioning will be based upon TIMESTAMP ranges
4. The "virtual" tables should approach (and possibly go past) 100M rows
5. Most (99%) of the INSERTs (and possibly SELECTs) will actually
operate on a rather small number of partitions (hardly more than 2).

My main TABLE is like the following one:

CREATE TABLE events (
  eventtype text not null,
  item_id int8 not null,
  event_date timestamp not null default now(),
  row_date timestamp not null default now(),
  event_id serial8 primary key
);

where the partitioning would happen over the values of the event_date column.
The row_date columns is to record the row creation TIMESTAMP as events
can be created relatively to the past, the future or the current time.

In my mind a solution which is simple to maintain is to add a simple
RULE ... ON INSERT for every newly created partition table.
The TRIGGER approach, in my opinion, is much more complex to maintain
as either the body of the function needs to be rewritten as new
partitions are added, or
some external TABLE lookup is needed to choose the actual table name
to be used for a (dynamically created) INSERT.

Now the questions.

1.  As the number of RULEs will grow with the time, how will change
the efficiency of the query planner while "browsing" among the RULES?
2. In the case the previous answer would lead to bad news for me, is
there any better/different approach to partitioning with TRIGGERs?
3. Is there any more general advise for such approaches?

-- 
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



  The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
http://in.yahoo.com/

[GENERAL] Table Partitioning Advice Request

2009-12-17 Thread Vincenzo Romano
Hi all.

I'm planning to implement table partitioning as "suggested" (among
other sources) in the official documentation.
I'm using v8.4.2 at the moment.

My case is far from the general one as:
1. I'll never UPDATE or DELETE rows from partitioned tables (only INSERTs)
2. Rows will be inserted one-by-one or, in the worse case, in bunches
of two or three
3. Partitioning will be based upon TIMESTAMP ranges
4. The "virtual" tables should approach (and possibly go past) 100M rows
5. Most (99%) of the INSERTs (and possibly SELECTs) will actually
operate on a rather small number of partitions (hardly more than 2).

My main TABLE is like the following one:

CREATE TABLE events (
  eventtype text not null,
  item_id int8 not null,
  event_date timestamp not null default now(),
  row_date timestamp not null default now(),
  event_id serial8 primary key
);

where the partitioning would happen over the values of the event_date column.
The row_date columns is to record the row creation TIMESTAMP as events
can be created relatively to the past, the future or the current time.

In my mind a solution which is simple to maintain is to add a simple
RULE ... ON INSERT for every newly created partition table.
The TRIGGER approach, in my opinion, is much more complex to maintain
as either the body of the function needs to be rewritten as new
partitions are added, or
some external TABLE lookup is needed to choose the actual table name
to be used for a (dynamically created) INSERT.

Now the questions.

1.  As the number of RULEs will grow with the time, how will change
the efficiency of the query planner while "browsing" among the RULES?
2. In the case the previous answer would lead to bad news for me, is
there any better/different approach to partitioning with TRIGGERs?
3. Is there any more general advise for such approaches?

-- 
Vincenzo Romano
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning : Having child tables in multiple database servers

2009-07-07 Thread Dimitri Fontaine
Hi,

Ransika de Silva  writes:
> The client wants to have the freedom of increasing the processor power
> AND the storage by introducing new database servers.

I think the following document will be of interest:
  http://wiki.postgresql.org/wiki/Image:Moskva_DB_Tools.v3.pdf

It presents the architecture Skype is using, and how to achieve what you
need with plproxy and londiste and their other tools.

-- 
dim

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table Partitioning : Having child tables in multiple database servers

2009-07-07 Thread Ransika de Silva
Hello,
Many thanks for the quick response.

Why I raised the previous question was to get an answer/solution for the
following requirement;

The client wants to have the freedom of increasing the processor power AND
the storage by introducing new database servers.

What is your thought on getting the above mentioned requirement satisfied.

Thanking you and anticipating a response soon.

@ Grzegorz Jaśkiewicz
Apologies for directly sending you the mail.

Regards
Ransika


2009/7/7 Grzegorz Jaśkiewicz 

> you should ask the question on the list (press reply-all, not just
> reply to my previous message please).
>
>
> 2009/7/7 Ransika de Silva :
> > Hello,
> > Many thanks for the quick response.
> > Why I raised the previous question was to get an answer/solution for the
> > following requirement;
> > The client wants to have the freedom of increasing the processor power
> AND
> > the storage by introducing new database servers.
> > What is your thought on getting the above mentioned
> requirement satisfied.
> > Thanking you and anticipating a response soon.
>
> > Regards,
> > Ransika
>
> > 2009/7/7 Grzegorz Jaśkiewicz 
> >>
> >> On Tue, Jul 7, 2009 at 3:32 PM, Ransika de Silva
> wrote:
> >> >
> >> > Hi all,
> >> > We have got the Table Partitioning of PostgreSQL to work on one
> database
> >> > server, where the Parent Table and Inherited Child Tables are on one
> (1)
> >> > database server.
> >> > The question which we need to get answered is, whether we can have the
> >> > Child
> >> > Tables in separate database servers?
> >> > Can you please give us your expertise thoughts on this?
> >>
> >> no you can't
> >>
> >>
> >>
> >>
> >> --
> >> GJ
> >
> >
> >
> > --
> > Ransika De Silva
> > SCMAD 1.0, SCJP 1.4,
> > BSc.(Hons) Information Systems
> >
>
>
>
> --
> GJ
>



-- 
Ransika De Silva
SCMAD 1.0, SCJP 1.4,
BSc.(Hons) Information Systems


[GENERAL] Table Partitioning : Having child tables in multiple database servers

2009-07-07 Thread Ransika de Silva
Hi all,
We have got the Table Partitioning of PostgreSQL to work on one database
server, where the Parent Table and Inherited Child Tables are on one (1)
database server.

The question which we need to get answered is, whether we can have the Child
Tables in separate database servers?

Can you please give us your expertise thoughts on this?

Thanks & Regards
-- 
Ransika De Silva


Re: [GENERAL] Table partitioning

2009-05-20 Thread Tom Lane
=?ISO-8859-1?Q?Robert_Gravsj=F6?=  writes:
> Is there a reason to avoid populating the master table other that the 
> cost of the data always being included when querying?

You can do it if you want ... I think most people think it complicates
data management, but there's nothing in the system design that prevents
it.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Table partitioning

2009-05-20 Thread Robert Gravsjö
While reading through the docs on Partitioning, 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html,

I got to wonder about the example given.

The text in 5.9.2 states, item 1 in the first section, that the master 
table will contain no data. This is later confirmed by the example code 
where the trigger function, measurement_insert_trigger, redirects 
inserts to child tables and, failing to do so, raises an exception.


Is there a reason to avoid populating the master table other that the 
cost of the data always being included when querying?
My concern is that I would like to have the ELSE clause in 
measurement_insert_trigger just raise a log message and then return NEW.


Am I missing some huge drawback?

Regards,
roppert

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Table partitioning and query plans

2009-02-26 Thread Glen Parker

Hi all!

I have set up a couple of log tables in our database to use table 
partitioning.  So far it looks to be working pretty well, but I ran into 
a query plan issue that doesn't make sense to me at all.


We have a table "omslog", and a set of partition tables 
"omslog_part_", where the "#..." stuff is a serial number. 
There is a rule that redirects inserts into omslog, to the most recent 
partition table.  Pretty simple, and it works as advertised.


If I select a min() or max() aggregate against an indexed field in 
omslog, where as before it would use the index, now it does not. 
However, the index will be used if I execute the same aggregate against 
one of the partition tables.  The difference is demonstrated in the 
explains below.


Doing a sequential scan on the base table makes absolutely no sense 
whatsoever by any stretch of the imagination; as you can see, there are 
about 9  million rows there, and the planner knows it (frequent 
analyze).  What would cause the planner to be so silly in this instance? 
 This is on PG 8.1.4.  Would this happen to be something that an 
upgrade might fix?


Thanks for any insight,

-Glen



database=# explain select min(sub_system) from omslog_part_0002;

 Result  (cost=0.19..0.20 rows=1 width=0)
   InitPlan
 ->  Limit  (cost=0.00..0.19 rows=1 width=14)
   ->  Index Scan using idx_omslog_part_0002_subsys on 
omslog_part_0002  (cost=0.00..24212.57 rows=129781 width=14)

 Filter: ((sub_system)::text IS NOT NULL)


database=# explain select min(sub_system) from omslog;
 Aggregate  (cost=539575.39..539575.40 rows=1 width=14)
   ->  Append  (cost=0.00..516884.11 rows=9076511 width=14)
 ->  Seq Scan on omslog  (cost=0.00..509342.09 rows=8946709 
width=14)
 ->  Seq Scan on omslog_part_0002 omslog 
(cost=0.00..7542.02 rows=129802 width=14)



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table partitioning based on multiple criterias possible?

2007-09-12 Thread Richard Huxton

Ow Mun Heng wrote:

On Wed, 2007-09-12 at 09:42 +0100, Richard Huxton wrote:
Well, I suppose you could partition over (month,product) but you 
probably don't want to. Sounds fiddly to manage.


Yeah.. well current schema w/ everything in 1 table and the way things
are ordered, normalised design etc, is causing PG to calc stats wrongly
and i get a 1 to 500x ratio between plan and actual analysed rows which
is causing queries to run > 2 hours w/o returning results. (this was
tested w/ on tables w/ only ~3 million rows and joining another table w/
another 3 million data - one to many relationship, I think, is causing
the mis-estimates.)


Well, let's see the query + explain. It's not in one of your previous 
posts, is it? PG doesn't know about cross-column correlations, but you 
can usually get better than 500x out. Since it's taking > 2 hours to get 
a result we'll have to settle for explain rather than explain-analyse.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Table partitioning based on multiple criterias possible?

2007-09-12 Thread Ow Mun Heng
On Wed, 2007-09-12 at 09:42 +0100, Richard Huxton wrote:
> Ow Mun Heng wrote:
> > New to Pg and wondering the extent of PG's table partitioning
> > capability.
> > 
> > I have a huge table > 18 million rows(growth rate ~8 million a week)
> > which I like to implement partitioning.
> 
> OK
> 
> > Initially the plan is to just partition it by date. eg: 1 partition per
> > month.
> 
> Fair enough - do you do a lot of searching by date?

Yeah.. the products we usually query on are by dates. default is 1 to 2
weeks period. So, there will be overlaps eg: 3rd week of April to 1 week
of May.. which will hit 2 partitions

> 
> > Now, I'm thinking if it's possible to also partition them by
> > product/items.
> > 
> > is it possible to do multilevel partitioning sort of like 
> > 
> > if month = July then go to July Parition
> > if Product = Shampoo then go to shampoo partition
> > elif Product = Soap then go to soap partition
> 
> Well, I suppose you could partition over (month,product) but you 
> probably don't want to. Sounds fiddly to manage.

Yeah.. well current schema w/ everything in 1 table and the way things
are ordered, normalised design etc, is causing PG to calc stats wrongly
and i get a 1 to 500x ratio between plan and actual analysed rows which
is causing queries to run > 2 hours w/o returning results. (this was
tested w/ on tables w/ only ~3 million rows and joining another table w/
another 3 million data - one to many relationship, I think, is causing
the mis-estimates.)
> 
> Have you verified that performance will be a problem without this?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Table partitioning based on multiple criterias possible?

2007-09-12 Thread Richard Huxton

Ow Mun Heng wrote:

New to Pg and wondering the extent of PG's table partitioning
capability.

I have a huge table > 18 million rows(growth rate ~8 million a week)
which I like to implement partitioning.


OK


Initially the plan is to just partition it by date. eg: 1 partition per
month.


Fair enough - do you do a lot of searching by date?


Now, I'm thinking if it's possible to also partition them by
product/items.

is it possible to do multilevel partitioning sort of like 


if month = July then go to July Parition
if Product = Shampoo then go to shampoo partition
elif Product = Soap then go to soap partition


Well, I suppose you could partition over (month,product) but you 
probably don't want to. Sounds fiddly to manage.


Have you verified that performance will be a problem without this?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Table partitioning based on multiple criterias possible?

2007-09-12 Thread Ow Mun Heng
New to Pg and wondering the extent of PG's table partitioning
capability.

I have a huge table > 18 million rows(growth rate ~8 million a week)
which I like to implement partitioning.

Initially the plan is to just partition it by date. eg: 1 partition per
month.

Now, I'm thinking if it's possible to also partition them by
product/items.

is it possible to do multilevel partitioning sort of like 

if month = July then go to July Parition
if Product = Shampoo then go to shampoo partition
elif Product = Soap then go to soap partition

something like that..

Thanks


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] table partitioning pl/pgsql helpers

2007-06-04 Thread Enrico Sirola

Hi Robert,

Il giorno 01/giu/07, alle ore 04:08, Robert Treat ha scritto:

[...]

We I set these up for our clients, I typically seperate the  
partition creation

piece from the data insertion piece.  (Mostly as partition creation,
especially with rules, is a table locking event, which is better  
done in a
non-critical path).  If you really must do it all in one go, you'll  
have a


I see, and now I agree with you it's better to decouple partition  
creation

and data insertion.

much better chance of accomplishing this using all triggers (and  
triggers are
better anyway)... i think you could do it with a combination of  
rules and a
trigger (on insert to parent, create new parition and insert into  
it and
delete from parent) but it would certainly need testing to make  
sure you dont
have multi-rule evaluation... course since your making a trigger  
anyway...


Even more importantly, I just discovered (trying and then reading  
pgsql docs) that the rule
system is completely bypassed by the COPY FROM statement, so I think  
I'll rewrite
everything using some sort of trigger-generating procedure because I  
want this stuff

to work transparently (and we do lots of copy from).
Thanks for the advices,

Enrico Sirola
[EMAIL PROTECTED]





Re: [GENERAL] table partitioning pl/pgsql helpers

2007-06-01 Thread Robert Treat
On Wednesday 30 May 2007 12:55, Enrico Sirola wrote:
> Hello,
>
> I'm trying to write a trigger on insert which should insert the row
> in another
> table. The table on which to insert the row should be selected at
> runtime and
> it is not know in advance. For example, let's say we have a table with
> two columns, a date and an integer. a row is inserted into table XXX and
>
> CREATE TABLE XXX
> (
>   refdate date;
>   x2 integer;
> )
>
> when the statement
>
> insert into XXX VALUES ('2007-11-11', 1);
>
> is executed, a trigger (or rule) should be fired to insert the row into
> table XXX_2007 (having the same schema). If the XXX_* tables are
> created beforehand
> this is not a problem because you can set up a rule for each date
> performing
> the needed insert (as documented in http://www.postgresql.org/docs/
> 8.1/interactive/ddl-partitioning.html)
>
> The problem arises when you try to extend the trigger in order to
> also dinamically
> perform table creation is the XXX_2007 doesn't exist:
>
> I have, for example:
>
>
> CREATE OR REPLACE FUNCTION add_child_table(parent_name varchar, d date)
> RETURNS varchar AS $$
> DECLARE
>  new_table_name varchar;
> BEGIN
>raise notice '%', 'creating table';
>  new_table_name := date2tblname(parent_name, d); -- converts
> table name and date into child table name
>  execute 'CREATE TABLE ' || new_table_name || ' ( ) INHERITS
> (' || parent_name || ')';
>  execute 'CREATE RULE '
>
>  || new_table_name
>  || '_insert AS ON INSERT TO '
>  || parent_name
>  || ' WHERE ( refdate = DATE ' /* refdate is the
>
> field we use to partition */
>
>  ||  || d || 
>  || ' ) DO INSTEAD INSERT INTO '
>  || new_table_name
>  || ' VALUES ( NEW.* )';
>
>  return new_table_name;
> END;
> $$ LANGUAGE plpgsql;
>
> I can use the above to add a child table and the rule to implement
> partitioning
>
> CREATE OR REPLACE FUNCTION child_creation_trigger()
> RETURNS "trigger" AS $$
> BEGIN
>  IF ( child_exist(TG_RELNAME::text, NEW.refdate) = false ) THEN
>raise notice '%', 'creating ' || TG_RELNAME::text || '
> for ' || NEW.refdate::text;
>perform add_child_table(TG_RELNAME::text, NEW.refdate);
>--insert into ' || child_table_name || ' values
> ( NEW.* );
>RETURN NEW;
>  ELSE
>raise notice '%', 'NOT creating ' || TG_RELNAME::text
>
> || ' for ' || NEW.refdate::text;
>
>RETURN NEW;
>  END IF;
> END;
> $$ LANGUAGE plpgsql;
>
> The trigger function is hooked to the to-be-partitioned table
>
>
> CREATE TRIGGER XXX_trigger BEFORE INSERT
> ON XXX FOR EACH ROW
> execute procedure child_creation_trigger ();
>
>
> now, every time I insert a tuple into XXX, the trigger is fired and
> checks if the needed table exists or
> not. If not, it creates the table and rule and goes on. The problem
> is that in this case the first
> row is inserted into the XXX table, not in the (just created)
> XXX_ . Then I tried to insert
> the row myself from the trigger body (and return null in order to
> skip the original insertion), but
> I'm not able to do it (see the commented insert in the above IF
> clause) because I can't properly
> quote the target table name.
> I shoud perform a
>
> insert into child_table_name values (NEW.*);
>
> obviously written like this the plpgsql complains at runtime because
> child_table_name is not a table name.
> If, on the other side, I dynamically create the query like in
>
> execute 'insert into ' || child_table_name || ' values (NEW.*)';
>
> it complains because NEW in the execution context is unknown.
>
> This should be a rather common problem... Isn't it? Is there a
> canonical way to
> solve it? Maybe there's a trivial answer, but I have no plpgsql
> programming
> experience.
>

We I set these up for our clients, I typically seperate the partition creation 
piece from the data insertion piece.  (Mostly as partition creation, 
especially with rules, is a table locking event, which is better done in a 
non-critical path).  If you really must do it all in one go, you'll have a 
much better chance of accomplishing this using all triggers (and triggers are 
better anyway)... i think you could do it with a combination of rules and a 
trigger (on insert to parent, create new parition and insert into it and 
delete from parent) but it would certainly need testing to make sure you dont 
have multi-rule evaluation... course since your making a trigger anyway...  

-- 
Robert Treat
Database Architect
http://www.omniti.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] table partitioning pl/pgsql helpers

2007-05-30 Thread Enrico Sirola

Hello,

I'm trying to write a trigger on insert which should insert the row  
in another
table. The table on which to insert the row should be selected at  
runtime and

it is not know in advance. For example, let's say we have a table with
two columns, a date and an integer. a row is inserted into table XXX and

CREATE TABLE XXX
(
refdate date;
x2 integer;
)

when the statement

insert into XXX VALUES ('2007-11-11', 1);

is executed, a trigger (or rule) should be fired to insert the row into
table XXX_2007 (having the same schema). If the XXX_* tables are  
created beforehand
this is not a problem because you can set up a rule for each date  
performing
the needed insert (as documented in http://www.postgresql.org/docs/ 
8.1/interactive/ddl-partitioning.html)


The problem arises when you try to extend the trigger in order to  
also dinamically

perform table creation is the XXX_2007 doesn't exist:

I have, for example:


CREATE OR REPLACE FUNCTION add_child_table(parent_name varchar, d date)
RETURNS varchar AS $$
DECLARE
new_table_name varchar;
BEGIN
  raise notice '%', 'creating table';
new_table_name := date2tblname(parent_name, d); -- converts  
table name and date into child table name
execute 'CREATE TABLE ' || new_table_name || ' ( ) INHERITS  
(' || parent_name || ')';

execute 'CREATE RULE '
|| new_table_name
|| '_insert AS ON INSERT TO '
|| parent_name
|| ' WHERE ( refdate = DATE ' /* refdate is the  
field we use to partition */

||  || d || 
|| ' ) DO INSTEAD INSERT INTO '
|| new_table_name
|| ' VALUES ( NEW.* )';
return new_table_name;
END;
$$ LANGUAGE plpgsql;

I can use the above to add a child table and the rule to implement  
partitioning


CREATE OR REPLACE FUNCTION child_creation_trigger()
RETURNS "trigger" AS $$
BEGIN
IF ( child_exist(TG_RELNAME::text, NEW.refdate) = false ) THEN
  raise notice '%', 'creating ' || TG_RELNAME::text || '  
for ' || NEW.refdate::text;

  perform add_child_table(TG_RELNAME::text, NEW.refdate);
  --insert into ' || child_table_name || ' values  
( NEW.* );

  RETURN NEW;
ELSE
  raise notice '%', 'NOT creating ' || TG_RELNAME::text  
|| ' for ' || NEW.refdate::text;

  RETURN NEW;
END IF;
END;
$$ LANGUAGE plpgsql;

The trigger function is hooked to the to-be-partitioned table


CREATE TRIGGER XXX_trigger BEFORE INSERT
ON XXX FOR EACH ROW
execute procedure child_creation_trigger ();


now, every time I insert a tuple into XXX, the trigger is fired and  
checks if the needed table exists or
not. If not, it creates the table and rule and goes on. The problem  
is that in this case the first
row is inserted into the XXX table, not in the (just created)  
XXX_ . Then I tried to insert
the row myself from the trigger body (and return null in order to  
skip the original insertion), but
I'm not able to do it (see the commented insert in the above IF  
clause) because I can't properly

quote the target table name.
I shoud perform a

insert into child_table_name values (NEW.*);

obviously written like this the plpgsql complains at runtime because  
child_table_name is not a table name.

If, on the other side, I dynamically create the query like in

execute 'insert into ' || child_table_name || ' values (NEW.*)';

it complains because NEW in the execution context is unknown.

This should be a rather common problem... Isn't it? Is there a  
canonical way to
solve it? Maybe there's a trivial answer, but I have no plpgsql  
programming

experience.

Thanks a lot in advance,

Enrico Sirola
[EMAIL PROTECTED]





[GENERAL] table partitioning and plpgsql functions in 8.2.3

2007-04-11 Thread paul rivers

Overview:

plpgsql functions seem to ignore partitioning, even with 
constraint_exclusion on.


Description:

Version is 8.2.3 on RHEL 4, constraint_exlusion is on.  I have an
events table (fw_events) partitioned by an int and a date (fw_id,
fw_date for discussion) following the recommendations outlined in
5.9 of the manual.

FWIW, each partition holds around 1M rows.  There are presently
about 250 partitions (2 ids, around 4+ months of dates).

explain select count(*) from fw_events where fw_id = 1 and
fw_date = '2007-04-08' shows that the single partition table is
examined, and results are snappy when executed.

I created a function to do the same count, and it took orders of
magnitude longer.

I then created a plpgsql function to return the explain plan
instead, which seemed to indicate the plpgsql function scans all 
the partitions.

Shouldn't the plpgsql function honor the partitioning and only
examine the single partition in the above example?


Thanks in advance,
Paul




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] table partitioning and plpgsql functions in 8.2.3

2007-04-08 Thread paul rivers

Apologies, I should have read more cafeully - this is already documented:

5.9.5 Caveats

Constraint exclusion only works when the query's WHERE clause contains
constants. A parameterized query will not be optimized, since the planner
cannot know what partitions the parameter value might select at run time.
For the same reason, "stable" functions such as CURRENT_DATE must be
avoided.




-Original Message-
From: paul rivers [mailto:[EMAIL PROTECTED] 
Sent: Sunday, April 08, 2007 2:40 PM
To: 'pgsql general'
Subject: table partitioning and plpgsql functions in 8.2.3


Overview:

plpgsql functions seem to ignore partitioning, even with 
constraint_exclusion on.


Description:

Version is 8.2.3 on RHEL 4, constraint_exlusion is on.  I have an
events table (fw_events) partitioned by an int and a date (fw_id,
fw_date for discussion) following the recommendations outlined in
5.9 of the manual.

FWIW, each partition holds around 1M rows.  There are presently
about 250 partitions (2 ids, around 4+ months of dates).

explain select count(*) from fw_events where fw_id = 1 and
fw_date = '2007-04-08' shows that the single partition table is
examined, and results are snappy when executed.

I created a function to do the same count, and it took orders of
magnitude longer.

I then created a plpgsql function to return the explain plan
instead, which seemed to indicate the plpgsql function scans all 
the partitions.

Shouldn't the plpgsql function honor the partitioning and only
examine the single partition in the above example?


Thanks in advance,
Paul




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] table partitioning

2006-08-22 Thread Chris

Hi Gene,

I'm using a cron job which creates a new partition table every night at 
midnight. After I create the table, I update the rule on the parent 
table so that new records go into the new partition table. Then I update 
the constraints on the tables so that constraint exclusion works. I'm 
doing it this way because you'll find if you have multiple rules on the 
parent table it might not return the proper result for the number of 
affected rows (might not be an issue for you). If you don't need to do 
it like this, just create a script that creates all the tables ahead of 
time. for year-month shouldn't be too many.


Thanks for the info. I think I'll have to use a similar method - it's 
for a web-app with quite non-static data.


--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] table partitioning

2006-08-22 Thread Gene
Chris,I'm using a cron job which creates a new partition table every night at midnight. After I create the table, I update the rule on the parent table so that new records go into the new partition table. Then I update the constraints on the tables so that constraint exclusion works. I'm doing it this way because you'll find if you have multiple rules on the parent table it might not return the proper result for the number of affected rows (might not be an issue for you). If you don't need to do it like this, just create a script that creates all the tables ahead of time. for year-month shouldn't be too many.
GeneOn 8/22/06, Chris <[EMAIL PROTECTED]> wrote:
Hi all,I'm checking out table partitioning and wondering how to handle itdynamically.For example, if I want to partition by year & month, do I need to createall of the partitions (& rules) before hand or can I handle all of that
another way (a pl/pgsql function? something else?)Any pointers would be most handy, thanks!--Postgresql & php tutorialshttp://www.designmagick.com/
---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster-- Eugene Hart


[GENERAL] table partitioning

2006-08-22 Thread Chris

Hi all,

I'm checking out table partitioning and wondering how to handle it 
dynamically.


For example, if I want to partition by year & month, do I need to create 
all of the partitions (& rules) before hand or can I handle all of that 
another way (a pl/pgsql function? something else?)


Any pointers would be most handy, thanks!

--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] table partitioning

2004-09-16 Thread Leonardo Francalanci
A couple of days ago I announced that I wrote a JDBC driver that
adds table partitioning features to databases accessed via JDBC.

I also wrote:

> In case you think this could be of any interest if integrated
> in Postgresql (I mean if it was a core functionality of Postgresql,
> not just a JDBC driver) let me know.

But nobody seemed to care...
Wouldn't this be an interesting feature of Postgresql?


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Vivek Khera
> "JL" == Jean-Luc Lachance <[EMAIL PROTECTED]> writes:


JL> BULL.
JL> How many times does PG have to scan the whole table because of MVCC?
JL> At least with partitioning there is a fighting chance that that won't be
JL> necessary.

Huh?  His specific query was "WHERE md5 = ''".  Why on earth would
that force a sequence scan if it were an indexed column?  Heck, the
btree index should rule out 15/16ths of the rows after the first
character comparison.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread David Busby
Is this partitioning like the schemas mentioned here:
http://www.postgresql.org/docs/current/static/ddl-schemas.html?  Would those
help and increase performance?

/B

- Original Message - 
From: "Jean-Luc Lachance" <[EMAIL PROTECTED]>
To: "Vivek Khera" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, October 10, 2003 14:23
Subject: Re: [GENERAL] Table partitioning for maximum speed?


> BULL.
>
> How many times does PG have to scan the whole table because of MVCC?
> At least with partitioning there is a fighting chance that that won't be
> necessary.
> Queries that involve the field on which the table is partitioned execute
> faster by an order of magnitude.
> It also helps with vaccuming as PG can vaccum only one partition at a
> time.
> I have 17M row table where all records get frequently updated over a
> year.
> I would do my own partitioning with inheritance if it was not broken.
> Partitioning would be a BIG plus in my book. So would visibility of
> records but that is another fight.
>
> JLL
>
> Vivek Khera wrote:
> >
> > >>>>> "JB" == Jeff Boes <[EMAIL PROTECTED]> writes:
> >
> > JB> Will a query against a table of 0.5 million rows beat a query
against
> > JB> a table of 7 million rows by a margin that makes it worth the hassle
> > JB> of supporting 15 "extra" tables?
> >
> > I think you'll be better off with a single table, as you won't have
> > contention for the index pages in the cache.
> >
> > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> > index bloat, which will make them not fit in cache.  Just check the
> > size of your index in the pg_class table, and when it gets big,
> > reindex (assuming you do lots of updates/inserts to the table).
> >
> > Your table splitting solution sounds like something I'd do if I were
> > forced to use mysql ;-)
> >
> > --
> > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > Vivek Khera, Ph.D.Khera Communications, Inc.
> > Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
> > AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> >
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if
your
> >   joining column's datatypes do not match
>
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Jean-Luc Lachance
Jean-Luc Lachance wrote:
> 
> BULL.
> 
> How many times does PG have to scan the whole table because of MVCC?
> At least with partitioning there is a fighting chance that that won't be
> necessary.
> Queries that involve the field on which the table is partitioned execute
> faster by an order of magnitude.
> It also helps with vaccuming as PG can vaccum only one partition at a
> time.
> I have 17M row table where all records get frequently updated over a
> year.
> I would do my own partitioning with inheritance if it was not broken.
> Partitioning would be a BIG plus in my book. So would visibility of
> records but that is another fight.


I meant to say visibility of record in the index.


> 
> JLL
> 
> Vivek Khera wrote:
> >
> > > "JB" == Jeff Boes <[EMAIL PROTECTED]> writes:
> >
> > JB> Will a query against a table of 0.5 million rows beat a query against
> > JB> a table of 7 million rows by a margin that makes it worth the hassle
> > JB> of supporting 15 "extra" tables?
> >
> > I think you'll be better off with a single table, as you won't have
> > contention for the index pages in the cache.
> >
> > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> > index bloat, which will make them not fit in cache.  Just check the
> > size of your index in the pg_class table, and when it gets big,
> > reindex (assuming you do lots of updates/inserts to the table).
> >
> > Your table splitting solution sounds like something I'd do if I were
> > forced to use mysql ;-)
> >
> > --
> > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > Vivek Khera, Ph.D.Khera Communications, Inc.
> > Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
> > AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> >
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >   joining column's datatypes do not match
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Jean-Luc Lachance
BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

JLL

Vivek Khera wrote:
> 
> > "JB" == Jeff Boes <[EMAIL PROTECTED]> writes:
> 
> JB> Will a query against a table of 0.5 million rows beat a query against
> JB> a table of 7 million rows by a margin that makes it worth the hassle
> JB> of supporting 15 "extra" tables?
> 
> I think you'll be better off with a single table, as you won't have
> contention for the index pages in the cache.
> 
> One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> index bloat, which will make them not fit in cache.  Just check the
> size of your index in the pg_class table, and when it gets big,
> reindex (assuming you do lots of updates/inserts to the table).
> 
> Your table splitting solution sounds like something I'd do if I were
> forced to use mysql ;-)
> 
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.Khera Communications, Inc.
> Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Joe Conway
Jeff Boes wrote:
Obviously, this is conceptually similar to what the index on the "md5" 
column is supposed to do for us. However, partitioning moves just a 
little of the processing load off the database server and onto the 
machine running the application. That's important, because we can afford 
more application machines as load increases, but we can't as easily 
upgrade the database server.

Will a query against a table of 0.5 million rows beat a query against a 
table of 7 million rows by a margin that makes it worth the hassle of 
supporting 15 "extra" tables?

I don't think 16 tables on the same server will help, but if you already 
have your app tier physically separate from the database tier, you could 
partition your data to more than one database server based on the first 
byte of the md5 column. I designed and built something similar a few 
years ago. We never got to the point where we really needed that kind of 
scalability, but it worked pretty well in (limited) testing.

Joe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Joe Conway
Bruno Wolff III wrote:
The data format change suggested by someone else may be worth trying
as well. In addition to their suggestions, you might experiment with
keeping the hash in either 4 ints or 2 bigints. If you use bigints,
you could probably just use an index on one of the bigints and have
only a small chance of finding more than one row that matches.
This is an interesting idea. Alternatively just use bytea and store the 
16 bytes directly (i.e. no hex or base64 encoding). There is b-tree 
index support for bytea.

Joe



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Vivek Khera
> "JB" == Jeff Boes <[EMAIL PROTECTED]> writes:

JB> Will a query against a table of 0.5 million rows beat a query against
JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache.  Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Bruno Wolff III
Please keep discussions on the list so that others may learn from or comment
on the suggested solutions.

On Fri, Oct 10, 2003 at 11:27:50 -0400,
  Jeff Boes <[EMAIL PROTECTED]> wrote:
> Bruno Wolff III wrote:
> 
> >On Thu, Oct 09, 2003 at 18:37:19 +,
> > Jeff Boes <[EMAIL PROTECTED]> wrote:
> > 
> >
> >>
> >>The idea bandied about now is to partition this table into 16 (or 256, 
> >>or ...) chunks by first digit (or 2, or ...). In the simplest case, this 
> >>would mean:
> >>   
> >>
> >
> >If there is an index on the checksum column, then you shouldn't get
> >much of a speed up by partitioning the data.
> >If you don't have an index on the checksum, it sounds like you should.
> > 
> >
> Yes, the table has:
> 
>Table "public.link_checksums"
> Column  | Type  | Modifiers
> -+---+---
> md5 | character(32) | not null
> link_id | integer   | not null
> Indexes: ix_link_checksums_pk primary key btree (md5)

In that event I would expect that you might only save a few disk accesses
by having a btree with fewer levels.

If the query is slow, it might be doing a sequential search because of
a type mismatch. You can use explain to double check what plan is being
used.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Bruno Wolff III
On Fri, Oct 10, 2003 at 13:40:17 -0400,
  Jeff Boes <[EMAIL PROTECTED]> wrote:
> 
> So, again: will front-loading the work by mapping the original query to 
> 16 (or 256) different queries by examining the first digit save us 
> anything? (My colleague who came up with this idea thinks so, since the 
> calculation will be done on a box other than the database host, and even 
> one disk access saved per query would outweigh the calculation.)

This could potentially save you some disk accesses. If the index is mostly
in cache now, you might not get a noticable benefit.

> Will having 15 (or 255) additional tables make the cache behave 
> differently? Is there overhead associated with having another 15 (or 
> 255) tables?

I am not sure whether or not you would use significantly more space
by having multiple tables.

The data format change suggested by someone else may be worth trying
as well. In addition to their suggestions, you might experiment with
keeping the hash in either 4 ints or 2 bigints. If you use bigints,
you could probably just use an index on one of the bigints and have
only a small chance of finding more than one row that matches.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Alvaro Herrera
On Fri, Oct 10, 2003 at 04:30:20PM -, [EMAIL PROTECTED] wrote:

> One way to speed things up is to convert the entire checksum. Consider 
> what a md5 checksum really is: a text string representing a hexadecimal 
> number. Storing it as TEXT or CHAR is not as good as storing it as a 
> number directly. Have your application convert it to a decimal number, 
> and then store the checksum as type NUMERIC in the database.

A subsequent idea is that with NUMERIC or other variable length fields
you are wasting time, space and cache hits anyway.  It would be probably
faster to create a custom datatype, with fixed length for the exact size
of an MD5 sum.  With suitable input and output functions and all the
operators you need, you will likely gain some additional performance
boost.

IIRC, Manfred Koizar developed a fixed-width char datatype for Shridar
Daitankhar (sp?) maybe a year ago.  It is probably a good starting
point.  Look for it in the pgsql-performance archives.

-- 
Alvaro Herrera ()
"Endurecerse, pero jamás perder la ternura" (E. Guevara)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Table partitioning for maximum speed?

2003-10-10 Thread Jeff Boes
Bruno Wolff III wrote:

On Fri, Oct 10, 2003 at 11:27:50 -0400,
 Jeff Boes <[EMAIL PROTECTED]> wrote:
 

Yes, the table has:

  Table "public.link_checksums"
Column  | Type  | Modifiers
-+---+---
md5 | character(32) | not null
link_id | integer   | not null
Indexes: ix_link_checksums_pk primary key btree (md5)
   

In that event I would expect that you might only save a few disk accesses
by having a btree with fewer levels.
If the query is slow, it might be doing a sequential search because of
a type mismatch. You can use explain to double check what plan is being
used.
 

Actually, the query is *not* slow; but since we executing it a million 
times a day, any savings we can realize will add up in a hurry. For 
example, yesterday this query resulted in the following stats:

   'count' => 814621,
   'avg' => '0.009',
   'time' => '7674.932'
That is, we executed it 814,621 times, for a total (wallclock) time 
spent waiting of 7,674 seconds (obviously, we have multiple backends 
executing). So, even if we can cut this by only 0.004, that would result 
in a savings of almost an hour.

So, again: will front-loading the work by mapping the original query to 
16 (or 256) different queries by examining the first digit save us 
anything? (My colleague who came up with this idea thinks so, since the 
calculation will be done on a box other than the database host, and even 
one disk access saved per query would outweigh the calculation.)

Will having 15 (or 255) additional tables make the cache behave 
differently? Is there overhead associated with having another 15 (or 
255) tables?

--
Jeff Boes  vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
  ...Nexcerpt... Extend your Expertise


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[GENERAL] Table partitioning?

2001-06-10 Thread Gerald Gutierrez


I was browsing some database mailing lists and found this message available 
at:

--
(http://www.phpbuilder.com/forum/read.php3?num=2&id=139678&thread=139671)
I don't know MSSQL, but if it tries to compete with Oracle, it should have 
this funcitonality (which oracle does):

divide tables into smaller tables that are searched on queries based on a 
set field/rule.

An example of this would be:  given the first character of a text column, 
insert the row into the table partition that has other rows with the same 
letter.   When a query is done where mycol = 'axxs'; the first thing Oracle 
does is decide which table partition to use and then does the rest of the 
where statement.

You should be able to design a database application without ever caring how 
many rows will be in each table...as I said, dealing with that is a DBA issue.
--

It seems that, depending on how data is actually stored on disk and 
accessed, this method of partitioning a table internally in the database 
could be a benefit. Has anyone investigated this for PostgreSQL?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html