Thank you. Sorry I have been away for a few days and couldn't thank you before.

Wouldn't this have an impact if there are things like views or functions based 
on the old table?

On 08/09/2014, at 04:57, Huang, Suya wrote:

> Instead of deleting from the original non-partition table which is not 
> efficient, you can try below approach.
>  
> Put below logic in a function as you like:
>  
> Create a new partition table.
> Insert data from original non-partition table to the correct partition of new 
> partition table.
> Build index and analyze as needed.
> Rename old non-partition table to something else.
> Rename new partition table to the correct name as you wanted.
>  
> Drop old non-partition table if you’re satisfied with current table structure.
>  
> Thanks,
> Suya
> From: pgsql-general-ow...@postgresql.org 
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Herouth Maoz
> Sent: Monday, September 08, 2014 12:00 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Decreasing performance in table partitioning
>  
> 
> Hello all.
>  
> I have created a function that partitions a large table into monthly 
> partitions. Since the name of the table, target schema for partitions, name 
> of the date field etc. are all passed as strings, the function is heavily 
> based on EXECUTE statements.
>  
> My problem is the main loop, in which data for one month is moved from the 
> old table to the partition table.
>  
> (1)
>             EXECUTE FORMAT (
>                 'WITH del AS (
>                      DELETE FROM %1$I.%2$I
>                      WHERE %3$I >= %4$L AND %3$I < %5$L
>                      RETURNING *
>                  )
>                  INSERT INTO %6$I.%7$I
>                  SELECT * FROM del',
>                 p_main_schema,
>                 p_table_name,
>                 p_date_field_name,
>                 v_curr_month_str,
>                 v_curr_month_to_str,
>                 p_partition_schema,
>                 v_partition_name
>             );
>  
> In the first few iterations, this runs in very good times. But as iterations 
> progress, performance drops, despite the size of the date for each month 
> being more or less the same. Eventually I end up with iterations that run for 
> hours, when I started with only a few minutes. The odd thing is that the last 
> iteration, which is actually for a month not yet inserted into that table (0 
> records to move) it took 6 hours for the above statement to run!
>  
> I tried to improve this, by first testing whether there are any records for 
> the current month in the table, adding:
>  
> (2)
>  
>         EXECUTE FORMAT (
>             'SELECT true
>              FROM %1$I.%2$I
>              WHERE %3$I >= %4$L AND %3$I < %5$L
>              LIMIT 1',
>             p_main_schema,
>             p_table_name,
>             p_date_field_name,
>             v_curr_month_str,
>             v_curr_month_to_str
>         ) INTO v_exists;
>  
> Before the above statement, and putting it in an IF statement on v_exists. 
> Also, after each move, I added:
>  
> EXECUTE FORMAT ( 'ANALYZE %I.%I', p_main_schema, p_table_name );
>  
> But to no avail. In fact, in each iteration, the execution of statement 2 
> above takes more and more time.
>  
> Here is the number of rows in each month for the table I was trying to 
> partition:
>  
>   count   |      the_month      
> ----------+---------------------
>  10907117 | 2013-08-01 00:00:00
>  12715234 | 2013-09-01 00:00:00
>  14902928 | 2013-10-01 00:00:00
>  10933566 | 2013-11-01 00:00:00
>  11394906 | 2013-12-01 00:00:00
>   9181051 | 2014-01-01 00:00:00
>   8487028 | 2014-02-01 00:00:00
>   9892981 | 2014-03-01 00:00:00
>   8830191 | 2014-04-01 00:00:00
>   8368638 | 2014-05-01 00:00:00
>   8014685 | 2014-06-01 00:00:00
>   6780589 | 2014-07-01 00:00:00
>  
>  
> And the times for each iteration:
>  
> Month    Statement 2  Statement 1
> 2013-08  3 sec        3 min
> 2013-09  2 min        17 min
> 2013-10  4 min        21 min
> 2013-11  8 min        20 min
> 2013-12  9 min        32 min
> 2014-01  16 min       21 min
> 2014-02  19 min       20 min
> 2014-03  14 min       23 min
>  
> For April I had to cancel it in the middle. My problem is that I can't let 
> this run into the evening, when we have backup followed by large data 
> collection. These times are just for the given statements, and additional 
> time is spent creating indexes on the partitions and so on. So this thing ran 
> from 11:24 until I had to cancel it at around 6PM.
>  
> Can anybody explain the performance deterioration and/or offer a suggestion 
> for a different design?
>  
> TIA,
> Herouth
>  
>  


--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742

Reply via email to