On Thu, 24 May 2012, Robert Haas wrote:

As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.

Actually the problem persits even when I trim the dataset size to be within
the shared_buffers.

Here is the dump (0.5 gig in size, tested with shared_buffers=10G,
work_mem=500Mb):
http://www.ast.cam.ac.uk/~koposov/files/dump.gz
And I attach the script

For my toy dataset the performance of a single thread goes down from ~6.4 to 18 seconds (~ 3 times worse),

And actually while running the script repeatedly on my main machine, for some reason I saw some variation in terms of how much threaded execution is slower than a single thread.

Now I see 25 seconds for multi threaded run vs the same ~ 6 second for a single thread.

The oprofile shows
 782355   21.5269  s_lock
  782355   100.000  s_lock [self]
-------------------------------------------------------------------------------
709801   19.5305  PinBuffer
  709801   100.000  PinBuffer [self]
-------------------------------------------------------------------------------
326457    8.9826  LWLockAcquire
  326457   100.000  LWLockAcquire [self]
-------------------------------------------------------------------------------
309437    8.5143  UnpinBuffer
  309437   100.000  UnpinBuffer [self]
-------------------------------------------------------------------------------
252972    6.9606  ReadBuffer_common
  252972   100.000  ReadBuffer_common [self]
-------------------------------------------------------------------------------
201558    5.5460  LockBuffer
  201558   100.000  LockBuffer [self]
------------------------------------------------------------

It is interesting that On another machine with much smaller shared memory (3G), smaller RAM (12G), smaller number of cpus and PG 9.1 running I was getting consistently ~ 7.2 vs 4.5 sec (for multi vs single thread)

PS Just in case the CPU on the main machine I'm testing is Xeon(R) CPU E7- 4807 (the total number of real cores is 24)





*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

Attachment: script.sh
Description: Bourne shell script

drop table _tmpXX ;
\timing
create table _tmpXX as select * from 
  ( select *, 
      (select healpixid from idt_match as m where m.transitid=o.transitid) 
        as x from idt_photoobservation_small as o offset 0
  ) as y where x%16=XX order by x;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to