RE: [POC] Fast COPY FROM command for the table with foreign partitions

[email protected] Mon, 08 Feb 2021 23:47:30 -0800

From: Andrey V. Lepikhov <[email protected]>
> On 2/9/21 9:35 AM, [email protected] wrote:
> > * Why is a separate FDW connection established for each COPY?  To avoid
> using the same FDW connection for multiple foreign table partitions in a 
> single
> COPY run?
> With separate connection you can init a 'COPY FROM' session for each
> foreign partition just one time on partition initialization.
> >
> > * In what kind of test did you get 2-4x performance gain?  COPY into many
> foreign table partitions where the input rows are ordered randomly enough that
> many rows don't accumulate in the COPY buffer?
> I used 'INSERT INTO .. SELECT * FROM generate_series(1, N)' to generate
> test data and HASH partitioning to avoid skews.


I guess you used many hash partitions.  Sadly, The current COPY implementation 
only accumulates either 1,000 rows or 64 KB of input data (very small!) before 
flushing all CopyMultiInsertBuffers.  One CopyMultiInsertBuffer corresponds to 
one partition.  Flushing a CopyMultiInsertBuffer calls ExecForeignCopy() once, 
which connects to a remote database, runs COPY FROM STDIN, and disconnects.  
Here, the flushing trigger (1,000 rows or 64 KB input data, whichever comes 
first) is so small that if there are many target partitions, the amount of data 
for each partition is small.

Looking at the triggering threshold values, the description (of 
MAX_BUFFERED_TUPLES at least) seems to indicate that they take effect per 
CopyMultiInsertBuffer:


/*
 * No more than this many tuples per CopyMultiInsertBuffer
 *
 * Caution: Don't make this too big, as we could end up with this many
 * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
 * multiInsertBuffers list.  Increasing this can cause quadratic growth in
* memory requirements during copies into partitioned tables with a large
 * number of partitions.
 */
#define MAX_BUFFERED_TUPLES     1000

/*
 * Flush buffers if there are >= this many bytes, as counted by the input
 * size, of tuples stored.
 */
#define MAX_BUFFERED_BYTES      65535


But these threshold take effect across all CopyMultiInsertBuffers:


/*
 * Returns true if the buffers are full
 */
static inline bool
CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
{
    if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
        miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
        return true;
    return false;
}


So, I think the direction to take is to allow more data to accumulate before 
flushing.  I'm not very excited about the way 0003 and 0004 establishes a new 
connection for each partition; it adds flags to many places, and 
postgresfdw_xact_callback() has to be aware of COPY-specific processing.  Plus, 
we have to take care of the message difference you found in the regression test.

Why don't we focus on committing the basic part and addressing the extended 
part (0003 and 0004) separately later?  As Tang-san and you showed, the basic 
part already demonstrated impressive improvement.  If there's no objection, I'd 
like to make this ready for committer in a few days.


Regards
Takayuki Tsunakawa

RE: [POC] Fast COPY FROM command for the table with foreign partitions

Reply via email to