On Apr 2, 2014, at 12:37 PM, Rob Sargent <robjsarg...@gmail.com> wrote:
> I'm playing with various data models to compare performance and > practicalities and not sure if I should be surprised by the numbers I'm > getting. I hope this report isn't too wishy-washy for reasoned comment. > > One model says a genotype is defined as follows: > Table "public.oldstyle" > +-------------+--------------+-----------+ > | Column | Type | Modifiers | > +-------------+--------------+-----------+ > | id | uuid | not null | > | sample_name | text | not null | > | marker_name | text | not null | > | allele1 | character(1) | | > | allele2 | character(1) | | > +-------------+--------------+-----------+ > (0. id is a Primary Key) > (1. Take what you will from the table name.) > (2. I hadn't thought of "char" type at this point) > (3. Ultimately the names would become ids, RI included) > (4. We're loading 39 samples and ~950K markers) > I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours > (800+ records/sec). Then I tried COPY and killed that after 11.25 hours when > I realised that I had added on non-unque index on the name fields after the > first load. By that point is was on line 28301887, so ~0.75 done which > implies it would have take ~15hours to complete. > > Would the overhead of the index likely explain this decrease in throughput? > > Impatience got the better of me and I killed the second COPY. This time it > had done 54% of the file in 6.75 hours, extrapolating to roughly 12 hours to > do the whole thing. That seems rather painfully slow. How exactly are you doing the bulk load? Are you CPU limited or disk limited? Have you read http://www.postgresql.org/docs/current/interactive/populate.html ? Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general