Re: [PERFORM] how could select id=xx so slow?

Yan Chunlu Wed, 11 Jul 2012 09:36:55 -0700

huge thanks for the patient explanations, I think you are right, it is
really related to the IO.
I monitor the IO using iostat -x and found the utilize part reach 100%
frequently, postgresql is the only service running on that machine, so I
think it is either checkpoint or queries caused the problem.


and I agree that checkpoint may not the problem, I guess I need to tackle
those damn queries.
currently the data dir(pgsql/data/base) used 111GB disk space, some tables
has tens of millions records. could that cause the query heavy disk IO?
 when should I split the data to other machines(aka sharding)?


and you are right the machine has 16GB memory and commodity 500GB disk.
kernel: Linux adams 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011
x86_64 GNU/Linux

by "new kernel" which version do you mean?

and about those IO intensive queries,  I can only tell the time used from
slow query log, is there anything like "explain analyze" that shows
specific information about IO usage?




On Wed, Jul 11, 2012 at 7:59 PM, Ants Aasma <[email protected]> wrote:

> On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <[email protected]> wrote:
> > I have logged one day data and found the checkpoint is rather
> > frequently(detail: https://gist.github.com/3088338). Not sure if it is
> > normal, but the average time of checkpoint is about 100sec~200sec, it
> seems
> > related with my settings:
> >
> > 574 checkpoint_segments = 64
> > 575 wal_keep_segments = 5000
> >
> > I set checkpoint_segments as a very large value which is because
> otherwise
> > the slave server always can not follow the master, should I lower that
> > value?
> >
> > or the slow query is about something else?  thanks!
>
> Some things to notice from the checkpoints log:
> * All chcekpoints are triggered by checkpoint_timeout, using up only a
> couple log files
> * Checkpoints write out around 40MB of buffers
> * The write out period is spread out nicely like it's supposed to but
> the sync phase is occasionally taking a very long time (more than 2
> minutes)
>
> This looks like something (not necessarily the checkpoint sync itself)
> is overloading the IO system. You might want to monitor the IO load
> with iostat and correlate it with the checkpoints and slow queries to
> find the culprit. It's also possible that something else is causing
> the issues.
>
> If the cause is checkpoints, just making them less frequent might make
> the problem worse. I'm assuming you have 16GB+ of RAM because you have
> 4GB of shared_buffers. Just making checkpoint_timeout longer will
> accumulate a larger number of dirty buffers that will clog up the IO
> queues even worse. If you are on Linux, lowering
> dirty_expire_centisecs or dirty_background_bytes might help to spread
> the load out but will make overall throughput worse.
>
> On the otherhand, if the I/O overload is from queries (more likely
> because some checkpoints sync quickly) there are no easy tuning
> answers. Making queries less IO intensive is probably the best you can
> do. From the tuning side, newer Linux kernels handle I/O fairness a
> lot better, and you could also try tweaking the I/O scheduler to
> achieve better throughput to avoid congestion or at least provide
> better latency for trivial queries. And of course its always possible
> to throw more hardware at the problem and upgrade the I/O subsystem.
>
> Ants Aasma
> --
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de
>

Re: [PERFORM] how could select id=xx so slow?

Reply via email to