I think (a part of) Your problem is that order by random() is O(N
logN) complexity, while You are after O(N) .
The solution (in pseudocode)
random_sample(resultset,K):
result := first K rows from resultset
resultset.scrollto(K+1)
p = K+1
while(resultset.hasMoreRows())
row =
On Sun, Mar 29, 2009 at 10:24 AM, marcin mank marcin.m...@gmail.com wrote:
I think (a part of) Your problem is that order by random() is O(N
logN) complexity, while You are after O(N) .
The solution (in pseudocode)
[snip]
OK, I may be guiding You the wrong way
select g,g,g,g from
On Mar 27, 2009, at 8:38 PM, Mike Charnoky wrote:
Hi Alban,
I experimented with your sample() solution and was rather baffled by
the results, as it was actually 2x-3x slower than doing an ORDER BY
RANDOM() LIMIT n. I even precalculated the size of the result set,
so that only one
On Mar 25, 2009, at 5:09 PM, Mike Charnoky wrote:
Due to the nature of the sampling (need to limit using several
parameters with a WHERE clause), I can't just generate random
numbers to select data that I need. Looks like I am stuck using
ORDER BY RANDOM(). The only other option at this
Mike Charnoky wrote:
Scott Marlowe wrote:
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky n...@nextbus.com wrote:
The random sampling query is normally pretty snappy. It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million. The sampling is
Scott Marlowe wrote:
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky n...@nextbus.com wrote:
The random sampling query is normally pretty snappy. It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million. The sampling is consistently quick, too.
Tom Lane t...@sss.pgh.pa.us writes:
Mike Charnoky n...@nextbus.com writes:
The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
(item_name, timestmp, ... )
SELECT item_name, timestmp, ... )
FROM raw_data
WHERE timestmp = ?
Gregory Stark wrote:
Tom Lane t...@sss.pgh.pa.us writes:
Mike Charnoky n...@nextbus.com writes:
The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
(item_name, timestmp, ... )
SELECT item_name, timestmp, ... )
FROM raw_data
Hello,
I'm looking for some insight on an intermittent PostgreSQL performance
problem that has been very troublesome. Using PG 8.3.5 on a server
running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10
SCSI 600GB array).
The problem in a nutshell is this: on some days, a nightly
On Mon, Mar 9, 2009 at 1:55 PM, Mike Charnoky n...@nextbus.com wrote:
Hello,
I'm looking for some insight on an intermittent PostgreSQL performance
problem that has been very troublesome. Using PG 8.3.5 on a server
running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10
SCSI
Mike Charnoky n...@nextbus.com writes:
The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
(item_name, timestmp, ... )
SELECT item_name, timestmp, ... )
FROM raw_data
WHERE timestmp = ? and timestmp ?
AND item_name=?
Yeah, I wish I didn't have to resort to using ORDER BY RANDOM(). I
really wanted to use something like TABLESAMPLE, but that is not
implemented in PostgreSQL. Unfortunately, I cannot use use the random
sampling technique you mentioned, since I need to select samples across
various strata of
The random sampling query is normally pretty snappy. It usually takes
on the order of 1 second to sample a few thousand rows of data out of a
few million. The sampling is consistently quick, too. However, on some
days, the sampling starts off quick, then when the process starts
sampling
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky n...@nextbus.com wrote:
The random sampling query is normally pretty snappy. It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million. The sampling is consistently quick, too. However, on some days,
14 matches
Mail list logo