Re: [GENERAL] Statistical Analysis

2000-07-25 Thread Guillaume Perréal

Steve Heaven wrote:
> 
> At 20:18 24/07/00 -0400, you wrote:
> >Sigh, forgot to mention the solution.  There's an undocumented function:
> >
> > * bool oidrand (oid o, int4 X)-
> > *   takes in an oid and a int4 X, and will return 'true'
> > * about 1/X of the time.
> >
> >typically used like this:
> >
> >-- select roughly 1/10 of the tuples
> >SELECT * FROM onek WHERE oidrand(onek.oid, 10);
> >
> 
> It doesnt seem to work as you explain.
> For a value of 1 you expect (nearly) all the tuples and two should return
> half, but that not what I'm finding.
> 
> galore=> select count(*) from topten where room='HI';
> count
> -
>14
> (1 row)
> 
> galore=> SELECT * FROM topten WHERE room='HE' and oidrand(topten.oid, 1);
> type|data |norder|room  |grp
> +-+--+--+---
> B   |0764552503   | 1|HE|
> B   |0751327190   | 1|HE|
> B   |0718144392   |  |HE|
> B   |0500280754   |  |HE|
> B   |0028610091   | 1|HE|
> (5 rows)
> galore=> SELECT * FROM topten WHERE room='HE' and oidrand(topten.oid, 2);
> type|data |norder|room  |grp
> +-+--+--+---
> B   |0764552503   | 1|HE|
> B   |0751327190   | 1|HE|
> B   |0718144392   |  |HE|
> (3 rows)
> galore=> SELECT * FROM topten WHERE room='HE' and oidrand(topten.oid, 7);
> type|data |norder|room  |grp
> +-+--+--+---
> B   |0751327190   | 1|HE|
> B   |0718144392   |  |HE|
> (2 rows)
> --
> thorNET  - Internet Consultancy, Services & Training
> Phone: 01454 854413
> Fax:   01454 854412
> http://www.thornet.co.uk

Isn't it because oidrand evals as 'random() < 1/X' ? or maybe 'random() <
1/(X+1)' ?

-- 
Guillaume Perréal - Stagiaire MIAG
Cemagref (URH), Lyon, France
Tél: (+33) 4.72.20.87.64



Re: [GENERAL] Statistical Analysis

2000-07-24 Thread Stephan Szabo

Are you grabbing a set of rows to work on in an outside app?

You may be able to get a smaller random set with:
select  from  order by random() limit 
But this will pretty much force a sort step [and if you're not limiting the
rows with a where clause, probably a full sequential scan] and could
be very expensive depending on the number or matching rows for any
limiting clauses you have.  You'd have to play with it in practice to see
if it's any faster.

- Original Message -
From: "Nathan Barnett" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, July 24, 2000 12:20 PM
Subject: [GENERAL] Statistical Analysis


> I am having to perform a large data analysis query fairly frequently and
the
> execution time is not exceptable, so I was looking at doing a statictical
> sample of the data to get fairly accurate results.  Is there a way to
> perform a query on a set number of random rows instead of the whole
dataset?
> I have looked through the documentation for a function that would do this,
> but I have not seen any.  If this is a RTFM type question, then feel free
to
> tell me so and point me in the right direction because I just haven't been
> able to find any info on it.






Re: [GENERAL] Statistical Analysis

2000-07-24 Thread Mitch Vincent

I don't think it's random (well, I'm sure it's not) but you could use LIMIT
to get a smaller number of results...

*shrug* just an idea.

Good luck!

-Mitch

- Original Message -
From: "Nathan Barnett" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, July 24, 2000 3:20 PM
Subject: [GENERAL] Statistical Analysis


> I am having to perform a large data analysis query fairly frequently and
the
> execution time is not exceptable, so I was looking at doing a statictical
> sample of the data to get fairly accurate results.  Is there a way to
> perform a query on a set number of random rows instead of the whole
dataset?
> I have looked through the documentation for a function that would do this,
> but I have not seen any.  If this is a RTFM type question, then feel free
to
> tell me so and point me in the right direction because I just haven't been
> able to find any info on it.
>
> Thanks ahead of time.
>
>
> ---
> Nathan Barnett
>
>




[GENERAL] Statistical Analysis

2000-07-24 Thread Nathan Barnett

I am having to perform a large data analysis query fairly frequently and the
execution time is not exceptable, so I was looking at doing a statictical
sample of the data to get fairly accurate results.  Is there a way to
perform a query on a set number of random rows instead of the whole dataset?
I have looked through the documentation for a function that would do this,
but I have not seen any.  If this is a RTFM type question, then feel free to
tell me so and point me in the right direction because I just haven't been
able to find any info on it.

Thanks ahead of time.


---
Nathan Barnett