Jack - I believe that you are on the right track. If you think about it,
political pollsters randomly assess national populations with only a few
thousand samples, and it works pretty good as long as it isn't biased by bad
questions, lying respondents, etc. And you don't have those issues here.
        A more important issue if you wish to reduce the system resources
devoted to analyzing tables would be to consider how volatile the table is.
Some tables rarely change, while others change frequently. 
        Then another issue is the overall objective you are trying to
achieve. Is it to ensure query optimization? Is it a large population of
ad-hoc queries? You may want to consider some of the new stored execution
plans for queries that you don't want to suddenly start making some bad
optimizer decisions.
Dennis Williams
DBA
Lifetouch, Inc.
[EMAIL PROTECTED]


-----Original Message-----
Sent: Tuesday, May 21, 2002 4:44 PM
To: Multiple recipients of list ORACLE-L


Hi all,

Did some investigation about statistical sampling this
weekend since we are going to optimize our analyze
process soon, and would like some input from all you
orabrains on this one.

I opened a TAR with Oracle asking about the sampling
algorithm of stats collection, and they assured me it
was random. 

The goal of analyze...estimate is to collect stats
that are representative of the data population as a
whole using a given sample set. Since analyzing tables
takes up resources (does sorts to order the data for
investigation) the fewer rows you use in estimate, the
less system resources you use and the faster the
analyze will go.

Since our goal is to get as small a sample as possible
and still have stats that are representative, my
contention is that we could start by finding what the
margin of error will be for each sample size and gauge
our tolerance for it.

One standard way to calculate margin of error for a
given sample is by using this formula: 

M = 1/SQRT(N)

where:
M = margin of error
N=sample size

So, if we can tolerate stats that have a 1% a margin
of error (will deviate from representative of the
whole population by 1%), our sample size should be
10,000 rows. 

Also, a corollary (not a toyota corollary, though) to
this would be that the more rows you add to your
sample, the closer to representative your sample will
be. So, in order to test whether your sample is
representative enough, you could analyze using either
estimate 49% or compute, take a snapshot of the stats,
and then compare the stats from a 10,000 row estimate
to those. Then, add rows to your estimate until you
are satisfied with the stats.

This of course is a pie in the sky mathematical model,
but seems like a reasonable place to start with
testing. 

Input? Input? Buhler? Buhler?


/jack silvey


__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack Silvey
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: DENNIS WILLIAMS
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Reply via email to