Re: [GENERAL] intermittant performance problem

2009-03-29 Thread marcin mank
I think (a part of) Your problem is that order by random() is O(N logN) complexity, while You are after O(N) . The solution (in pseudocode) random_sample(resultset,K): result := first K rows from resultset resultset.scrollto(K+1) p = K+1 while(resultset.hasMoreRows()) row =

Re: [GENERAL] intermittant performance problem

2009-03-29 Thread marcin mank
On Sun, Mar 29, 2009 at 10:24 AM, marcin mank marcin.m...@gmail.com wrote: I think (a part of) Your problem is that order by random() is O(N logN) complexity, while You are after O(N) . The solution (in pseudocode) [snip] OK, I may be guiding You the wrong way select g,g,g,g from

Re: [GENERAL] intermittant performance problem

2009-03-28 Thread Alban Hertroys
On Mar 27, 2009, at 8:38 PM, Mike Charnoky wrote: Hi Alban, I experimented with your sample() solution and was rather baffled by the results, as it was actually 2x-3x slower than doing an ORDER BY RANDOM() LIMIT n. I even precalculated the size of the result set, so that only one

Re: [GENERAL] intermittant performance problem

2009-03-26 Thread Alban Hertroys
On Mar 25, 2009, at 5:09 PM, Mike Charnoky wrote: Due to the nature of the sampling (need to limit using several parameters with a WHERE clause), I can't just generate random numbers to select data that I need. Looks like I am stuck using ORDER BY RANDOM(). The only other option at this

Re: [GENERAL] intermittant performance problem

2009-03-25 Thread Mike Charnoky
Mike Charnoky wrote: Scott Marlowe wrote: On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky n...@nextbus.com wrote: The random sampling query is normally pretty snappy. It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million. The sampling is

Re: [GENERAL] intermittant performance problem

2009-03-16 Thread Mike Charnoky
Scott Marlowe wrote: On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky n...@nextbus.com wrote: The random sampling query is normally pretty snappy. It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million. The sampling is consistently quick, too.

Re: [GENERAL] intermittant performance problem

2009-03-10 Thread Gregory Stark
Tom Lane t...@sss.pgh.pa.us writes: Mike Charnoky n...@nextbus.com writes: The sampling query which runs really slow on some days looks something like this: INSERT INTO sampled_data (item_name, timestmp, ... ) SELECT item_name, timestmp, ... ) FROM raw_data WHERE timestmp = ?

Re: [GENERAL] intermittant performance problem

2009-03-10 Thread Mike Charnoky
Gregory Stark wrote: Tom Lane t...@sss.pgh.pa.us writes: Mike Charnoky n...@nextbus.com writes: The sampling query which runs really slow on some days looks something like this: INSERT INTO sampled_data (item_name, timestmp, ... ) SELECT item_name, timestmp, ... ) FROM raw_data

[GENERAL] intermittant performance problem

2009-03-09 Thread Mike Charnoky
Hello, I'm looking for some insight on an intermittent PostgreSQL performance problem that has been very troublesome. Using PG 8.3.5 on a server running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10 SCSI 600GB array). The problem in a nutshell is this: on some days, a nightly

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Scott Marlowe
On Mon, Mar 9, 2009 at 1:55 PM, Mike Charnoky n...@nextbus.com wrote: Hello, I'm looking for some insight on an intermittent PostgreSQL performance problem that has been very troublesome.  Using PG 8.3.5 on a server running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10 SCSI

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Tom Lane
Mike Charnoky n...@nextbus.com writes: The sampling query which runs really slow on some days looks something like this: INSERT INTO sampled_data (item_name, timestmp, ... ) SELECT item_name, timestmp, ... ) FROM raw_data WHERE timestmp = ? and timestmp ? AND item_name=?

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Mike Charnoky
Yeah, I wish I didn't have to resort to using ORDER BY RANDOM(). I really wanted to use something like TABLESAMPLE, but that is not implemented in PostgreSQL. Unfortunately, I cannot use use the random sampling technique you mentioned, since I need to select samples across various strata of

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Mike Charnoky
The random sampling query is normally pretty snappy. It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million. The sampling is consistently quick, too. However, on some days, the sampling starts off quick, then when the process starts sampling

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Scott Marlowe
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky n...@nextbus.com wrote: The random sampling query is normally pretty snappy.  It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million.  The sampling is consistently quick, too.  However, on some days,