Re: [HACKERS] Batch update of indexes on data loading

ITAGAKI Takahiro Mon, 25 Feb 2008 22:20:47 -0800

Simon Riggs <[EMAIL PROTECTED]> wrote:

> One of the reasons why I hadn't wanted to pursue earlier ideas to use
> LOCK was that applying a lock will prevent running in parallel, which
> ultimately may prevent further performance gains.
> 
> Is there a way of doing this that will allow multiple concurrent COPYs?


I think there is same difficulty as parallel queries. It requires tighter
communication among COPY threads whether we will use multi-process model
or multi-thread model.

We have independent concurrent COPYs now; COPYs are not aware of each
other because no intermediate status during COPY. However, COPY will
have "phases" if we use bulkbuild. Therefore, we will need joining
COPY threads and passing each working memories between threads.

Here is a possible multi-threaded workload:

  A. For each row:
      1. Parsing new coming data
      2. Add the row into the heap.
      3. Spool index entries to each index spooler.
  B. Wait for all threads.
  C. Merge spools and corresponding existing indexes into new ones.

Phase A could be concurrently as same as now. A1 and A2 are independent
jobs. We could have shared spooler or per-thread spooler.
Phase B is needed to build indexes at once, or it will be double work.
Phase C could be concurrently for each indexes. A thread is responsible
to build one index. It merges the existing index and one shared spool
or multiple spools if we use per-thread spooler.

One of the issues is how to pass or share spoolers between COPY threads.
Another is how to make it transaction safe. If one of the thread fails to
build its index, all thread should be rollback.
I'm not sure how to do them...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [HACKERS] Batch update of indexes on data loading

Reply via email to