Re: [GENERAL] intermittant performance problem

Mike Charnoky Wed, 25 Mar 2009 09:12:01 -0700

Mike Charnoky wrote:

Scott Marlowe wrote:

On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <n...@nextbus.com> wrote:

The random sampling query is normally pretty snappy.  It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million.  The sampling is consistently quick, too.  However, on some days,
the sampling starts off quick, then when the process starts sampling from a
different subset of data (different range of times for the same day), the
sampling query takes a couple minutes.


Then definitely look at saving explain plans before execution to
compare fast to slow runs.  This definitely sounds like ocassionally
bad query plans to me so far.


Tom Lane wrote:

Mike Charnoky <n...@nextbus.com> writes:

The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
  (item_name, timestmp, ... )
  SELECT item_name, timestmp, ... )
  FROM raw_data
  WHERE timestmp >= ? and timestmp < ?
  AND item_name=?
  AND some_data_field NOTNULL
  ORDER BY random()
  LIMIT ?;

Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
There's no good way to optimize "ORDER BY random()".  However, it seems
like the first thing you should do is modify the program so that it
issues an EXPLAIN for that right before actually doing the query, and
then you could see if the plan is different on the slow days.


I'm at the end of my rope here.  Tried the following:

* Manually run ANALYZE on the table before running the sampling query.This does not help, there are still times when the sampling runs slowerby two orders of magnitude and disk IO is through the roof.* Bump work_mem for the query to 128MB (up from 32MB). This did nothelp. Also, no temp files were created in $PG/data/base/pgsql_tmp/, sowork_mem does not seem to be an issue.

* EXPLAINs look nearly identical whether the query runs quickly or slowly

The thing that gets me is, why does this query totally hose the entiredatabase? Other clients have a very hard time writing to the db whenthis sampling query is running slow, disk IO is maxxed out. This justdoesn't seem right. Why would a single pg backend strangle dbperformance to such an extent? Aren't there ways to throttle this back?

Due to the nature of the sampling (need to limit using severalparameters with a WHERE clause), I can't just generate random numbers toselect data that I need. Looks like I am stuck using ORDER BY RANDOM().The only other option at this point seems to be to implementTABLESAMPLE, probably starting with the academic work that Neil Conwaypublished (http://neilconway.org/talks/hacking/)



Mike

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] intermittant performance problem

Reply via email to