On Mon, Oct 12, 2015 at 11:17 AM, Shaun Thomas <bonesmo...@gmail.com> wrote:

> Hi guys,
>
> I've been doing some design investigation and ran into an interesting snag
> I didn't expect to find on 9.4 (and earlier). I wrote a quick python script
> to fork multiple simultaneous COPY commands to several separate tables and
> found that performance apparently degrades based on how many COPY commands
> are running.
>
> For instance, in the logs with one COPY, I see about one second to import
> 100k rows. At two processes, it's 2 seconds. At four processes, it's 4
> seconds. This is for each process. Thus loading 400k rows takes 16 seconds
> cumulatively. To me, it looked like some kind of locking issue, but
> pg_locks showed no waits during the load. In trying to figure this out, I
> ran across this discussion:
>
>
> http://www.postgresql.org/message-id/cab7npqqjeasxdr0rt9cjiaf9onfjojstyk18iw+oxi-obo4...@mail.gmail.com
>
> Which came after this:
>
> http://forums.enterprisedb.com/posts/list/4048.page
>
> It would appear I'm running into whatever issue the xloginsert_slots patch
> tried to address, but not much discussion exists afterwards. It's like the
> patch just kinda vanished into the ether even though it (apparently)
> massively improves PG's ability to scale data import.
>
> I should note that setting wal_level to minimal, or doing the load on
> unlogged tables completely resolves this issue. However, those are not
> acceptable settings in a production environment. Is there any other way to
> get normal parallel COPY performance, or is that just currently impossible?
>
> I also know 9.5 underwent a lot of locking improvements, so it might not
> be relevant. I just haven't gotten a chance to repeat my tests with 9.5
> just yet.
>


Can you provide the test script?  Also, have you tuned your database for
high io throughput?  What is your storage system like?

Reply via email to