Simon Riggs <[EMAIL PROTECTED]> wrote: > One of the reasons why I hadn't wanted to pursue earlier ideas to use > LOCK was that applying a lock will prevent running in parallel, which > ultimately may prevent further performance gains. > > Is there a way of doing this that will allow multiple concurrent COPYs?
I think there is same difficulty as parallel queries. It requires tighter communication among COPY threads whether we will use multi-process model or multi-thread model. We have independent concurrent COPYs now; COPYs are not aware of each other because no intermediate status during COPY. However, COPY will have "phases" if we use bulkbuild. Therefore, we will need joining COPY threads and passing each working memories between threads. Here is a possible multi-threaded workload: A. For each row: 1. Parsing new coming data 2. Add the row into the heap. 3. Spool index entries to each index spooler. B. Wait for all threads. C. Merge spools and corresponding existing indexes into new ones. Phase A could be concurrently as same as now. A1 and A2 are independent jobs. We could have shared spooler or per-thread spooler. Phase B is needed to build indexes at once, or it will be double work. Phase C could be concurrently for each indexes. A thread is responsible to build one index. It merges the existing index and one shared spool or multiple spools if we use per-thread spooler. One of the issues is how to pass or share spoolers between COPY threads. Another is how to make it transaction safe. If one of the thread fails to build its index, all thread should be rollback. I'm not sure how to do them... Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq