Hello all.

I have created a function that partitions a large table into monthly 
partitions. Since the name of the table, target schema for partitions, name of 
the date field etc. are all passed as strings, the function is heavily based on 
EXECUTE statements.

My problem is the main loop, in which data for one month is moved from the old 
table to the partition table.

(1)
            EXECUTE FORMAT (
                'WITH del AS (
                     DELETE FROM %1$I.%2$I
                     WHERE %3$I >= %4$L AND %3$I < %5$L
                     RETURNING *
                 )
                 INSERT INTO %6$I.%7$I
                 SELECT * FROM del',
                p_main_schema,
                p_table_name,
                p_date_field_name,
                v_curr_month_str,
                v_curr_month_to_str,
                p_partition_schema,
                v_partition_name
            );

In the first few iterations, this runs in very good times. But as iterations 
progress, performance drops, despite the size of the date for each month being 
more or less the same. Eventually I end up with iterations that run for hours, 
when I started with only a few minutes. The odd thing is that the last 
iteration, which is actually for a month not yet inserted into that table (0 
records to move) it took 6 hours for the above statement to run!

I tried to improve this, by first testing whether there are any records for the 
current month in the table, adding:

(2)

        EXECUTE FORMAT (
            'SELECT true
             FROM %1$I.%2$I
             WHERE %3$I >= %4$L AND %3$I < %5$L
             LIMIT 1',
            p_main_schema,
            p_table_name,
            p_date_field_name,
            v_curr_month_str,
            v_curr_month_to_str
        ) INTO v_exists;

Before the above statement, and putting it in an IF statement on v_exists. 
Also, after each move, I added:

EXECUTE FORMAT ( 'ANALYZE %I.%I', p_main_schema, p_table_name );

But to no avail. In fact, in each iteration, the execution of statement 2 above 
takes more and more time.

Here is the number of rows in each month for the table I was trying to 
partition:

  count   |      the_month      
----------+---------------------
 10907117 | 2013-08-01 00:00:00
 12715234 | 2013-09-01 00:00:00
 14902928 | 2013-10-01 00:00:00
 10933566 | 2013-11-01 00:00:00
 11394906 | 2013-12-01 00:00:00
  9181051 | 2014-01-01 00:00:00
  8487028 | 2014-02-01 00:00:00
  9892981 | 2014-03-01 00:00:00
  8830191 | 2014-04-01 00:00:00
  8368638 | 2014-05-01 00:00:00
  8014685 | 2014-06-01 00:00:00
  6780589 | 2014-07-01 00:00:00


And the times for each iteration:

Month    Statement 2  Statement 1
2013-08  3 sec        3 min
2013-09  2 min        17 min
2013-10  4 min        21 min
2013-11  8 min        20 min
2013-12  9 min        32 min
2014-01  16 min       21 min
2014-02  19 min       20 min
2014-03  14 min       23 min

For April I had to cancel it in the middle. My problem is that I can't let this 
run into the evening, when we have backup followed by large data collection. 
These times are just for the given statements, and additional time is spent 
creating indexes on the partitions and so on. So this thing ran from 11:24 
until I had to cancel it at around 6PM.

Can anybody explain the performance deterioration and/or offer a suggestion for 
a different design?

TIA,
Herouth


Reply via email to