Re: Parallel copy

Tomas Vondra Fri, 30 Oct 2020 13:57:05 -0700

On Fri, Oct 30, 2020 at 06:41:41PM +0200, Heikki Linnakangas wrote:

On 30/10/2020 18:36, Heikki Linnakangas wrote:
I find this design to be very complicated. Why does the line-boundary
information need to be in shared memory? I think this would be much
simpler if each worker grabbed a fixed-size block of raw data, and
processed that.
In your patch, the leader process scans the input to find out where one
line ends and another begins, and because of that decision, the leader
needs to make the line boundaries available in shared memory, for the
worker processes. If we moved that responsibility to the worker
processes, you wouldn't need to keep the line boundaries in shared
memory. A worker would only need to pass enough state to the next worker
to tell it where to start scanning the next block.
Here's a high-level sketch of how I'm imagining this to work:
The shared memory structure consists of a queue of blocks, arranged asa ring buffer. Each block is of fixed size, and contains 64 kB ofdata, and a few fields for coordination:
typedef struct
{
   /* Current state of the block */
   pg_atomic_uint32 state;

   /* starting offset of first line within the block */
   int     startpos;

   char    data[64 kB];
} ParallelCopyDataBlock;

Where state is one of:

enum {
 FREE,       /* buffer is empty */
 FILLED,     /* leader has filled the buffer with raw data */
READY, /* start pos has been filled in, but no worker processhas claimed the block yet */
 PROCESSING, /* worker has claimed the block, and is processing it */
}
State changes FREE -> FILLED -> READY -> PROCESSING -> FREE. As theCOPY progresses, the ring of blocks will always look something likethis:
blk 0 startpos  0: PROCESSING [worker 1]
blk 1 startpos 12: PROCESSING [worker 2]
blk 2 startpos 10: READY
blk 3 starptos  -: FILLED
blk 4 startpos  -: FILLED
blk 5 starptos  -: FILLED
blk 6 startpos  -: FREE
blk 7 startpos  -: FREE
Typically, each worker process is busy processing a block. After theblocks being processed, there is one block in READY state, and afterthat, blocks in FILLED state.
Leader process:
The leader process is simple. It picks the next FREE buffer, fills itwith raw data from the file, and marks it as FILLED. If no buffers areFREE, wait.
Worker process:

1. Claim next READY block from queue, by changing its state to
  PROCESSING. If the next block is not READY yet, wait until it is.

2. Start scanning the block from 'startpos', finding end-of-line
  markers. (in CSV mode, need to track when we're in-quotes).

3. When you reach the end of the block, if the last line continues to
  next block, wait for the next block to become FILLED. Peek into the
  next block, and copy the remaining part of the split line to a local
  buffer, and set the 'startpos' on the next block to point to the end
  of the split line. Mark the next block as READY.

4. Process all the lines in the block, call input functions, insert
  rows.

5. Mark the block as DONE.
In this design, you don't need to keep line boundaries in sharedmemory, because each worker process is responsible for finding theline boundaries of its own block.
There's a point of serialization here, in that the next block cannotbe processed, until the worker working on the previous block hasfinished scanning the EOLs, and set the starting position on the nextblock, putting it in READY state. That's not very different from yourpatch, where you had a similar point of serialization because theleader scanned the EOLs, but I think the coordination betweenprocesses is simpler here.


I agree this design looks simpler. I'm a bit worried about serializing
the parsing like this, though. It's true the current approach (where the
first phase of parsing happens in the leader) has a similar issue, but I
think it would be easier to improve that in that design.

My plan was to parallelize the parsing roughly like this:

1) split the input buffer into smaller chunks

2) let workers scan the buffers and record positions of interesting
characters (delimiters, quotes, ...) and pass it back to the leader

3) use the information to actually parse the input data (we only need to
look at the interesting characters, skipping large parts of data)

4) pass the parsed chunks to workers, just like in the current patch


But maybe something like that would be possible even with the approach
you propose - we could have a special parse phase for processing each
buffer, where any worker could look for the special characters, record
the positions in a bitmap next to the buffer. So the whole sequence of
states would look something like this:

    EMPTY
    FILLED
    PARSED
    READY
    PROCESSING

Of course, the question is whether parsing really is sufficiently
expensive for this to be worth it.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel copy

Reply via email to