On Apr 2, 2014, at 12:37 PM, Rob Sargent <robjsarg...@gmail.com> wrote:

> I'm playing with various data models to compare performance and 
> practicalities and not sure if I should be surprised by the numbers I'm 
> getting. I hope this report isn't too wishy-washy for reasoned comment.
> 
> One model says a genotype is defined as follows:
>          Table "public.oldstyle"
> +-------------+--------------+-----------+
> |   Column    |     Type     | Modifiers |
> +-------------+--------------+-----------+
> | id          | uuid         | not null  |
> | sample_name | text         | not null  |
> | marker_name | text         | not null  |
> | allele1     | character(1) |           |
> | allele2     | character(1) |           |
> +-------------+--------------+-----------+
> (0. id is a Primary Key)
> (1. Take what you will from the table name.)
> (2. I hadn't thought of "char" type at this point)
> (3. Ultimately the names would become ids, RI included)
> (4. We're loading 39 samples and ~950K markers)
> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours 
> (800+ records/sec).  Then I tried COPY and killed that after 11.25 hours when 
> I realised that I had added on non-unque index on the name fields after the 
> first load. By that point is was on line 28301887, so ~0.75 done which 
> implies it would have take ~15hours to complete.
> 
> Would the overhead of the index likely explain this decrease in throughput?
> 
> Impatience got the better of me and I killed the second COPY.  This time it 
> had done 54% of the file in 6.75 hours, extrapolating to roughly 12 hours to 
> do the whole thing.

That seems rather painfully slow. How exactly are you doing the bulk load? Are 
you CPU limited or disk limited?

Have you read http://www.postgresql.org/docs/current/interactive/populate.html ?

Cheers,
  Steve





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to