subject:"\[GENERAL\] Table partitioning"

Re: [GENERAL] Table partitioning

2013-10-28 Thread Herouth Maoz

Everything is slow about it - selects, deletes and inserts, that is. I don't do 
updates on that table.

The inserts and deletes are less of an issue because they are done once a week. 
Of course it would be nicer if they were faster, but that's less of an issue. 
The real issue is with self-joins, which are a common query. But I have indexes 
on the relevant fields: the connecting field (the one used for the self-join) 
as well as the date field. The queries are mostly of the format SELECT ... FROM 
tablename t1 left join tablename t2 on t1.indexed_field = t2.indexed_field and 
t2.boolean_field where t1.date_field >= 'date1' and t1.date_field < 'date2' and 
not t1.boolean; This can take about 15 minutes to run, depending on the date 
range. And it doesn't matter much if I put a date range within the left join, 
either.

As I mentioned in another message, I don't like the idea that my script may 
fail, and then inserts into the table might start to fail as well. I'm not 
always available to run it manually within a set time. And writing an 
automation that is different than all the other tables I maintain in that 
database makes for maintenance spaghetti. I also don't like running automated 
DDL commands. They don't play well with backups.

-הודעה מקורית-
מאת: Steve Crawford [mailto:scrawf...@pinpointresearch.com]
נשלח: ב 28/10/2013 22:31
אל: Herouth Maoz; pgsql-general@postgresql.org
נושא: Re: [GENERAL] Table partitioning

On 10/28/2013 09:27 AM, Herouth Maoz wrote:
> I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
> partitioning it by months, but I don't like the idea of creating and dropping 
> tables all the time.
What is slow about it? Inserting? Selecting? Deleting? Partitioning can 
assist with some issues but does no good if what you really need is an 
index or better query. Partitioning shines as an option to manage 
archiving/purging of time-series data but only if you work with it, not 
against it.

What don't you like about creating and dropping tables? You can easily 
automate it: https://github.com/keithf4/pg_partman

>
> I'm thinking of simply creating 12 child tables, in which the check condition 
> will be, for example, date_part('month'', time_arrived) = 1 (or 2 for 
> February, 3 for March etc.).
>
> I'll just be deleting records rather than dropping tables, the same way I do 
> in my current setup. I delete a week's worth every time.
You are missing out on one of the best aspects of partitioning. Compared 
to dropping or truncating a child table, deleting is far slower and 
causes table bloat which may impact future queries.
>
> Second, when I delete (not drop!) from the mother table, are records deleted 
> automatically from the child tables or do I need to create rules/triggers for 
> that?
>

Yes unless you use the keyword "ONLY": "If specified, deletes rows from 
the named table only. When not specified, any tables inheriting from the 
named table are also processed."

Cheers,
Steve

Re: [GENERAL] Table partitioning

2013-10-28 Thread Steve Crawford


On 10/28/2013 09:27 AM, Herouth Maoz wrote:

I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
partitioning it by months, but I don't like the idea of creating and dropping 
tables all the time.
What is slow about it? Inserting? Selecting? Deleting? Partitioning can 
assist with some issues but does no good if what you really need is an 
index or better query. Partitioning shines as an option to manage 
archiving/purging of time-series data but only if you work with it, not 
against it.


What don't you like about creating and dropping tables? You can easily 
automate it: https://github.com/keithf4/pg_partman




I'm thinking of simply creating 12 child tables, in which the check condition 
will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 
3 for March etc.).

I'll just be deleting records rather than dropping tables, the same way I do in 
my current setup. I delete a week's worth every time.
You are missing out on one of the best aspects of partitioning. Compared 
to dropping or truncating a child table, deleting is far slower and 
causes table bloat which may impact future queries.


Second, when I delete (not drop!) from the mother table, are records deleted 
automatically from the child tables or do I need to create rules/triggers for 
that?



Yes unless you use the keyword "ONLY": "If specified, deletes rows from 
the named table only. When not specified, any tables inheriting from the 
named table are also processed."


Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table partitioning

2013-10-28 Thread Elliot


On 2013-10-28 12:47, Herouth Maoz wrote:

Thanks. Assuming there is an index on the time_arrived column, and that there 
are about 10.5 million records in each child table, how bad will performance be 
if the query actually accesses all the 12 tables? Will it be as bad as using 
the full table?


In the case where you don't add a predicate to your select statement 
that matches your constraint checks, meaning your statements are 
visiting every child table, you're not going to experience any 
performance gains over leaving everything in one table (plus you're 
adding on extra maintenance overhead and likely degrading query 
performance, even if only by a small amount).


What are your concerns with creating/dropping tables? You may have even 
better success with creating a child table for each week. Fifty-two 
extra tables is extra maintenance but the performance boost you could 
get by dividing out your data is probably significant, and that kind of 
maintenance should be automated anyway.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table partitioning

2013-10-28 Thread Herouth Maoz

Thanks. Assuming there is an index on the time_arrived column, and that there 
are about 10.5 million records in each child table, how bad will performance be 
if the query actually accesses all the 12 tables? Will it be as bad as using 
the full table?

On 28/10/2013, at 18:31, Elliot wrote:

> On 2013-10-28 12:27, Herouth Maoz wrote:
>> I have a rather large and slow table in Postgresql 9.1. I'm thinking of 
>> partitioning it by months, but I don't like the idea of creating and 
>> dropping tables all the time.
>> 
>> I'm thinking of simply creating 12 child tables, in which the check 
>> condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 
>> for February, 3 for March etc.).
>> 
>> I'll just be deleting records rather than dropping tables, the same way I do 
>> in my current setup. I delete a week's worth every time.
>> 
>> So, I have two questions.
>> 
>> First, is constraint exclusion going to work with that kind of condition? I 
>> mean, if my WHERE clause says something like "time_arrived >= '2013-04-05' 
>> and time_arrived < '2013-04-17'", will it be able to tell that 
>> date_part("month",time_arrived) for all the records is 4, and therefore 
>> avoid selecting from any partitions other than the april one?
>> 
>> Second, when I delete (not drop!) from the mother table, are records deleted 
>> automatically from the child tables or do I need to create rules/triggers 
>> for that?
>> 
>> 
>> TIA,
>> Herouth
>> 
> 1. No - you'd need a condition like "where date_part("month", time_arrived) = 
> 1" in your select statements in order for the constraint exclusion to kick in
> 2. Yes - there is no need to create rules or triggers for deletes on the 
> parent table (check out the syntax for "delete from " versus "delete 
> from only )
> 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table partitioning

2013-10-28 Thread Elliot

On 2013-10-28 12:27, Herouth Maoz wrote:

I have a rather large and slow table in Postgresql 9.1. I'm thinking of
partitioning it by months, but I don't like the idea of creating and dropping
tables all the time.

I'm thinking of simply creating 12 child tables, in which the check condition
will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February,
3 for March etc.).

I'll just be deleting records rather than dropping tables, the same way I do in
my current setup. I delete a week's worth every time.

So, I have two questions.

First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says
something like "time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be
able to tell that date_part("month",time_arrived) for all the records is 4, and therefore avoid
selecting from any partitions other than the april one?

Second, when I delete (not drop!) from the mother table, are records deleted
automatically from the child tables or do I need to create rules/triggers for
that?

TIA,
Herouth

1. No - you'd need a condition like "where date_part("month",
time_arrived) = 1" in your select statements in order for the constraint
exclusion to kick in
2. Yes - there is no need to create rules or triggers for deletes on the
parent table (check out the syntax for "delete from " versus
"delete from only )

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

55 matches

Mail list logo