Re: Parallel copy

vignesh C Wed, 14 Oct 2020 06:22:02 -0700

On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila <[email protected]> wrote:
>
> On Thu, Oct 8, 2020 at 12:14 AM vignesh C <[email protected]> wrote:
> >
> > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <[email protected]> 
> > wrote:
> > > > > + */
> > > > > +typedef struct ParallelCopyLineBoundary
> > > > >
> > > > > Are we doing all this state management to avoid using locks while
> > > > > processing lines?  If so, I think we can use either spinlock or LWLock
> > > > > to keep the main patch simple and then provide a later patch to make
> > > > > it lock-less.  This will allow us to first focus on the main design of
> > > > > the patch rather than trying to make this datastructure processing
> > > > > lock-less in the best possible way.
> > > > >
> > > >
> > > > The steps will be more or less same if we use spinlock too. step 1, 
> > > > step 3 & step 4 will be common we have to use lock & unlock instead of 
> > > > step 2 & step 5. I feel we can retain the current implementation.
> > > >
> > >
> > > I'll study this in detail and let you know my opinion on the same but
> > > in the meantime, I don't follow one part of this comment: "If they
> > > don't follow this order the worker might process wrong line_size and
> > > leader might populate the information which worker has not yet
> > > processed or in the process of processing."
> > >
> > > Do you want to say that leader might overwrite some information which
> > > worker hasn't read yet? If so, it is not clear from the comment.
> > > Another minor point about this comment:
> > >
> >
> > Here leader and worker must follow these steps to avoid any corruption
> > or hang issue. Changed it to:
> >  * The leader & worker process access the shared line information by 
> > following
> >  * the below steps to avoid any data corruption or hang:
> >
>
> Actually, I wanted more on the lines why such corruption or hang can
> happen? It might help reviewers to understand why you have followed
> such a sequence.


There are 3 variables which the leader & worker are working on:
line_size, line_state & data. Leader will update line_state & populate
data, update line_size & line_state. Workers will wait for line_state
to be updated, once the updated leader will read the data based on the
line_size. If the worker is not synchronized wrong line_size will be
set & read wrong amount of data, anything can happen.There are 3
variables which leader & worker are working on: line_size, line_state
& data. Leader will update line_state & populate data, update
line_size & line_state. Workers will wait for line_state to be
updated, once the updated leader will read the data based on the
line_size. If the worker is not synchronized wrong line_size will be
set & read wrong amount of data, anything can happen. This is the
usual concurrency case with reader/writers. I felt that much details
need not be mentioned.

> > >
> > > How did you ensure that this is fixed? Have you tested it, if so
> > > please share the test? I see a basic problem with your fix.
> > >
> > > + /* Report WAL/buffer usage during parallel execution */
> > > + bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
> > > + walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
> > > + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
> > > +   &walusage[ParallelWorkerNumber]);
> > >
> > > You need to call InstrStartParallelQuery() before the actual operation
> > > starts, without that stats won't be accurate? Also, after calling
> > > WaitForParallelWorkersToFinish(), you need to accumulate the stats
> > > collected from workers which neither you have done nor is possible
> > > with the current code in your patch because you haven't made any
> > > provision to capture them in BeginParallelCopy.
> > >
> > > I suggest you look into lazy_parallel_vacuum_indexes() and
> > > begin_parallel_vacuum() to understand how the buffer/wal usage stats
> > > are accumulated. Also, please test this functionality using
> > > pg_stat_statements.
> > >
> >
> > Made changes accordingly.
> > I have verified it using:
> > postgres=# select * from pg_stat_statements where query like '%copy%';
> >  userid | dbid  |       queryid        |
> >                          query
> >                | plans | total_plan_time |
> > min_plan_time | max_plan_time | mean_plan_time | stddev_plan_time |
> > calls | total_exec_time | min_exec_time | max_exec_time |
> > mean_exec_time | stddev_exec_time |  rows  | shared_blks_hi
> > t | shared_blks_read | shared_blks_dirtied | shared_blks_written |
> > local_blks_hit | local_blks_read | local_blks_dirtied |
> > local_blks_written | temp_blks_read | temp_blks_written | blk_
> > read_time | blk_write_time | wal_records | wal_fpi | wal_bytes
> > --------+-------+----------------------+---------------------------------------------------------------------------------------------------------------------+-------+-----------------+-
> > --------------+---------------+----------------+------------------+-------+-----------------+---------------+---------------+----------------+------------------+--------+---------------
> > --+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+-----
> > ----------+----------------+-------------+---------+-----------
> >      10 | 13743 | -6947756673093447609 | copy hw from
> > '/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
> > csv, delimiter ',')               |     0 |               0 |
> >             0 |             0 |              0 |                0 |
> >  1 |      265.195105 |    265.195105 |    265.195105 |     265.195105
> > |                0 | 175000 |            191
> > 6 |                0 |                 946 |                 946 |
> >          0 |               0 |                  0 |                  0
> > |              0 |                 0 |
> >         0 |              0 |        1116 |       0 |   3587203
> >      10 | 13743 |  8570215596364326047 | copy hw from
> > '/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
> > csv, delimiter ',', parallel '2') |     0 |               0 |
> >             0 |             0 |              0 |                0 |
> >  1 |    35668.402482 |  35668.402482 |  35668.402482 |   35668.402482
> > |                0 | 175000 |            310
> > 1 |               36 |                 952 |                 919 |
> >          0 |               0 |                  0 |                  0
> > |              0 |                 0 |
> >         0 |              0 |        1119 |       6 |   3624405
> > (2 rows)
> >
>
> I am not able to properly parse the data but If understand the wal
> data for non-parallel (1116 |       0 |   3587203) and parallel (1119
> |       6 |   3624405) case doesn't seem to be the same. Is that
> right? If so, why? Please ensure that no checkpoint happens for both
> cases.
>

I have disabled checkpoint, the results with the checkpoint disabled
are given below:
                                           | wal_records | wal_fpi | wal_bytes
Sequential Copy                   | 1116            |       0   |   3587669
Parallel Copy(1 worker)         | 1116            |       0   |   3587669
Parallel Copy(4 worker)         | 1121            |       0   |   3587668
I noticed that for 1 worker wal_records & wal_bytes are same as
sequential copy, but with different worker count I had noticed that
there is difference in wal_records & wal_bytes, I think the difference
should be ok because with more than 1 worker the order of records
processed will be different based on which worker picks which records
to process from input file. In the case of sequential copy/1 worker
the order in which the records will be processed is always in the same
order hence wal_bytes are the same.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel copy

Reply via email to