On Mon, 8 Aug 2005, Tom Lane wrote:
What that sounds like to me is a machine with inadequate disk I/O bandwidth.
Your earlier comment that checkpoint drives the machine into the ground
fits right into that theory, too.  You said there is "almost no IO-wait"
but are you sure you are measuring that correctly?

Reducing checkpoint_timeout to 600 seconds had a positive effect. Previous value was 1800 seconds.

We have a spare disk array from the old server and I'm planning to use it as a tablespace for the comment table (the 100M+ rows one) as Ron suggested.

Queries accumulate and when checkpointing is over, there can be
something like 400 queries running but over 50% of cpu is just idling.

400 queries?  Are you launching 400 separate backends to do that?
Some sort of connection pooling seems like a good idea, if you don't
have it in place already.  If the system's effective behavior in the
face of heavy load is to start even more concurrent backends, that
could easily drive things into the ground.

Ok, I implemented connection pooling using pgpool and it increased performance a lot! We are now delivering about 1500 dynamic pages a second without problems. Each of the eight single-cpu webservers are running a pgpool instance with 20 connections.

However, those configuration changes didn't have significant effect to oprofile results. AtEOXact_CatCache consumes even more cycles. This isn't a problem right now but it may be in the future...

CPU: AMD64 processors, speed 2190.23 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 100000
samples  %        symbol name
1147870  21.1602  AtEOXact_CatCache
187466    3.4558  hash_seq_search
174357    3.2142  AllocSetAlloc
170896    3.1504  nocachegetattr
131724    2.4282  ExecMakeFunctionResultNoSets
125292    2.3097  SearchCatCache
117264    2.1617  StrategyDirtyBufferList
105741    1.9493  hash_search
98245     1.8111  FunctionCall2
97878     1.8043  yyparse
90932     1.6763  LWLockAcquire
83555     1.5403  LWLockRelease
81045     1.4940  _bt_compare
... and so on ...

----->8 Signigicant rows from current postgresql.conf 8<-----

max_connections = 768 # unnecessarily large with connection pooling
shared_buffers = 15000
work_mem = 2048
maintenance_work_mem = 32768
max_fsm_pages = 1000000
max_fsm_relations = 5000
bgwriter_percent = 2
fsync = true
wal_buffers = 512
checkpoint_segments = 200 # less would probably be enuff with 600sec timeout
checkpoint_timeout = 600
effective_cache_size = 500000
random_page_cost = 1.5
default_statistics_target = 150
stats_start_collector = true
stats_command_string = true


    |\__/|
    ( oo )    Kari Lavikka - [EMAIL PROTECTED] - (050) 380 3808
__ooO(  )Ooo_______ _____ ___ _ _  _   _    _      _                  _
      ""


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to