>
> yeah.  ok, nest steps:
> *) can you confirm that postgres process is using high cpu (according
> to top) during stall time
>

yes, CPU is spread across a lot of postmasters

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29863 pgsql     20   0 3636m 102m  36m R 19.1  0.3   0:01.33 postmaster
30277 pgsql     20   0 3645m 111m  37m R 16.8  0.3   0:01.27 postmaster
11966 pgsql     20   0 3568m  22m  15m R 15.1  0.1   0:00.66 postmaster
 8073 pgsql     20   0 3602m  60m  26m S 13.6  0.2   0:00.77 postmaster
29780 pgsql     20   0 3646m 115m  43m R 13.6  0.4   0:01.13 postmaster
11865 pgsql     20   0 3606m  61m  23m S 12.8  0.2   0:01.87 postmaster
29379 pgsql     20   0 3603m  70m  30m R 12.8  0.2   0:00.80 postmaster
29727 pgsql     20   0 3616m  77m  31m R 12.5  0.2   0:00.81 postmaster




> *) if, so, please strace that process and save some of the log
>

https://dl.dropbox.com/u/109778/stall_postmaster.log


> *) you're using a 'bleeding edge' kernel.  so we must be suspicious of
> a regression there, particularly in the scheduler.
>

this was observed for a while, during which period system went from using
3.4.* kernels to 3.5.*... but I do not deny such a possibility.


> *) I am suspicious of spinlock issue. so, if we can't isolate the
> problem, is running a hand complied postgres a possibility (for lock
> stats)?
>


Yes, definitely possible. we run manually compiled postgresql anyway. Pls,
provide instructions.




> *) what is the output of this:
> echo /proc/sys/vm/zone_reclaim_mode
>
>
I presume you wanted cat instead of echo, and it shows 0.


-- vlad

Reply via email to