RE: OEM - Automation of Start of Collection for Performance Reports
How can Start of Collection of ALL (Multiple) Performance Reports be automated in OEM 9.2 , before a Transactions' Run ? There Exist about 100 such Performance Reports in OEM . Manually starting Collection of these individually in OEM takes too much effort collection has to be started much before the actual transactions' run thereby containing lots of unnecessary data . Are we missing some way ? NOTE - Collection for All these STARTed performance reports can be STOPPED by a Single Click though in OEM . Thanks -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: VIVEK_SHARMA INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: OEM - Automation of Start of Collection for Performance Reports
I have seen some folks record the performance data and then play it back in OEM. I think it is there in the performance manager add in. Is that what you want? How different you expect this to be from STATSPACK? Babu VIVEK_SHARMA [EMAIL PROTECTED]To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] osys.comcc: Sent by: Subject: OEM - Automation of Start of Collection for Performance Reports [EMAIL PROTECTED] 02/25/03 09:38 PM Please respond to ORACLE-L How can Start of Collection of Performance Reports be automated in OEM 9.2 , before a Transactions' Run ? There Exist about 100 such Performance Reports . Manually starting Collection of these individually takes too much effort collection has to be started much before the actual transactions' run thereby containing lots of unnecessary data . NOTE - Collection for All these STARTed performance reports can be STOPPED by a Single Click . -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: VIVEK_SHARMA INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). _ This e-mail transmission and any attachments to it are intended solely for the use of the individual or entity to whom it is addressed and may contain confidential and privileged information. If you are not the intended recipient, your use, forwarding, printing, storing, disseminating, distribution, or copying of this communication is prohibited. If you received this communication in error, please notify the sender immediately by replying to this message and delete it from your computer. -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Passing data in a collection via a dblink
Hi all, A developper wants to pass a collection (varray) between 2 stored procs on 2 databases via a dblink. Can we declared a collection of a remote database locally ? This is on 8172 Stephane Paquette Administrateur de bases de donnees Database Administrator Standard Life www.standardlife.ca Tél. (514) 925-7187 [EMAIL PROTECTED]
OEM - Automation of Start of Collection for Performance Reports
How can Start of Collection of Performance Reports be automated in OEM 9.2 , before a Transactions' Run ? There Exist about 100 such Performance Reports . Manually starting Collection of these individually takes too much effort collection has to be started much before the actual transactions' run thereby containing lots of unnecessary data . NOTE - Collection for All these STARTed performance reports can be STOPPED by a Single Click . -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: VIVEK_SHARMA INET: [EMAIL PROTECTED] Fat City Network Services-- 858-538-5051 http://www.fatcity.com San Diego, California-- Mailing list and web hosting services - To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Oracle Trace Collection Services
I have a third party app that runs on Oracle 7.3.4 on Windoze. The app is getting some error and the vendor recommends that we uninstall the Oracle Trace Collection services from each client. If this is removed from the client will I lose my ability to run traces on a clients session?? Dave -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Farnsworth, Dave INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John Kanagaraj Sent: Tuesday, May 28, 2002 1:49 PM To: Multiple recipients of list ORACLE-L Subject: RE: Statistical sampling and representative stats collection Btw, there is an option in 9i DBMS_STATS.AUTO_SAMPLE_SIZE. Has anyone tried this out? Or know how it works? I would be interested in that as well. for these stats on non-indexed columns? If so, I disagree on this point, even if it is general advice and not a rule. Stats on non-indexed columns can play a *large* role in CBO decisions. I agree with you, although I do have to contend that the effect is not very pronounced in all databases. This was revealed in some depth in Wolfgang's paper at IOUG where he was able to actually determine this in a 10053 trace, and it was an eye opener for me. The issue I have with this is that the default number of buckets is 2 and that is nowhere near what is needed. On the other hand, indiscriminately increasing the bucket size would leave you with a _Large_ number of histograms and result in the 'row cache lock' latch being taken out more that it should have been (as well as add to the cost of parsing). I would only consider gathering this info, or customizing the stats gathering process in general, on a case by case basis and only when needed. Due to the nature of the data and queries, you might find the need to gather the non-indexed column stats on a single table and 1 column, or maybe 2 tables, or maybe many tables and columns, or maybe none at all. You already know, then, how stats on non-indexed columns can play a role. But since I received a few back-channel emails asking for examples of how stats on the non-indexed columns make a difference, I'll include the following extreme example that illustrates. But by no means am I saying you should always gather them, just on an as needed basis. And I haven't included the 10053 traces since that could get very tedious ;-) Say we have two tables, tables A and B, each consisting of two columns, A and B. Column A is the PK on each, column B is not indexed. So: Table A: 50,000 Rows Table B: 50,000 Rows Column A.A: (PK) Values 1 thru 50,000 Column B.A: (PK) Values 1 thru 50,000 Column A.B: Not indexed, values 1 thru 25000, each occurring twice e.g. 1,1,2,2,3,3... Column B.B: Not indexed, values 0 and 1, each occurring 25,000 times Note that I am not really even dealing with any skewness at this point this is an even distribution of values. Here are 4 sample queries: Sample Queries: Select * From A, B WHERE A.A = B.A and A.B = 5 - Filters down to 2 rows, I want to drive by A in an NL fashion Select * From A, B WHERE A.A = B.A and A.B = 5 - Filters down to 2 rows, I still want to drive by A in an NL fashion and B.B = 1 - Filters down to 25,000 rows if treated standalone Select * From A, B WHERE A.A = B.A and B.B = 1 - Filters down to 25,000 rows, I want FTS's and HJ. Select * From A, B WHERE A.A = B.A and B.B = 27000 - no rows meet this, I want to drive the query by table B. With a basic compute for table for all indexed columns, we get hash joins with all four, in the case of the first two, driving by table A, and the next two driving by table B. There were no stats on the non-indexed column. This isn't what I really wanted for all 4 queries. Now, if I simply do an analyze table compute statistics, in which case stats are gathered for both columns A and B, you get the default bucket and two rows for each column over in DBA_TAB_HISTOGRAMS. In the case of the first 2 queries, I get a nested loops driven by table A using an FTS and an index lookup into table B. This is what I want because of the really restrictive filtering criteria on table As non-indexed column B, resulting in two rows being selected, and two indexed lookups into table B. The stats on the non-indexed column helped the CBO make this decision. In the case of queries 3 and 4, I would get a MERGE JOIN, using a full index scan on As PK to get each row, for the purpose of feeding the data in sorted order (I might have preferred an FTS and actually doing the sort!). And an FTS on table B. Note that by increasing the HASH_AREA_SIZE, queries 3 and 4 simply went with FTSs and an HJ, driving by table B. I would have hoped that in the case of query 4 that the CBO would have chosen to drive by table B, using a nested loops index approach into table A. Upon no rows being returned from table B, table A would not have been accessed at all. So, we go back and analyze specifying a SIZE for column B. Queries 1 and 2 use a nested loops driving by table A, and the restrictive filtering criteria result in 2 indexed lookups into table B. This is what I want. In the case of query 3, and having dropped the hash area size back down, it does FTSs and an HJ driving by table B. This is what I want. And in the case of query 4, it uses a nested loops approach, driving
RE: Statistical sampling and representative stats collection
Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. I've used them in the past, but only in very specific instances and certainly not for all tables/columns. Sometimes just 1 or 2, sometimes 15 or 20. And only in those cases where needed. Cherie Machler Oracle DBA Gelco Information Network -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Larry Elkins INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
Sorry, I did not use anything that sophisticated (nor accurate?): Uniform distribution = 1/(number of unique values) Max distribution = (the highest occurrence of a value) / (number of records in the table) Tom -Original Message- Sent: Tuesday, May 28, 2002 4:34 PM To: Multiple recipients of list ORACLE-L tom, This is interesting. How did you determine max distribution and uniform distribution? Did you use standard deviation and variance? regards, jack silvey --- Terrian, Tom [EMAIL PROTECTED] wrote: John, I know in a previous job, we determined that histograms where not worth it. The following is from a test that we performed: *** Table-F_tab Uniform DistributionMax Distribution Field-P_code 0.65% 18% Therefore, from the above numbers, the field should be a good candidate for histograms so I did the following tests. Based on the following combinations of statistics and histograms, I timed how fast a sample query ran: w/o stats w/ statsw/stats w/stats P_Codeno histograms 100 buckets 50 buckets ---- -- -- -- 0101 342 secs. 428 385 500 0101 406 416 326 340 0101 391 390 327 359 6501 458 490 337 342 6501 475 380 358 490 6501 518 395 326 354 --- -- -- -- Total Secs. 1730162913482085 (w/o high and low values) Avg time 7Min 12Sec 6Min 47Sec 5Min 37Sec 5Min 51Sec per run However, to create the histogram it takes 1hr42min. Too long for the benefit that we gain. *** Tom -Original Message- Sent: Tuesday, May 28, 2002 3:25 PM To: Multiple recipients of list ORACLE-L Ian, John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? To keep this simplistic, I wouldn't use Histograms (or let it default to 2) *unless* hardcoded values are known to be used, at least in 8i. The situation becomes different in 9i as the CBO is able to peek into these values even when bind variables are used. (I think there is a script out there on Steve Adam's site called 'Histogram Helper' which can suggest this for you). However, as Larry mentioned in a previous email, the CBO is influenced by distributions in non-indexed colummns. The issue here is that the number of buckets really matter, and the default of 2 can influence incorrect decisions (haven't we all seen 'em? :) So what I am essentially saying is this: Use COMPUTE and Histograms when you have to, but don't sweat over it unless it pinches ya. And how do we determine it is pinching? V$SYSSTAT is a pretty good indicator: (At the risk of being called a part of the 'ratios' group) Is the ratio of 'table scan blocks gotten' to 'table scan rows gotten' acceptable? Is the number of table scans acceptable? Is the number of 'db block gets' too much - as compared to 'physical reads'? I am in the process of determining the overheads of having 'too many' histograms - I am observing some 'row cache lock' latch waits and think that this could have been the result of too many histograms. Hope to post some info back to the list soon. John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please
RE: Statistical sampling and representative stats collection
Have used them on a for all indexed columns basis and they make a massive difference on heavily skewed data - particularly a sort of waiting to be processed flag which only has about 5 distinct values but the ones we want to pick will make up only about 0.01%. Haven't used them on all columns as we don't often filter on non-indexed columns. Iain Nicoll -Original Message- Sent: Wednesday, May 29, 2002 12:59 PM To: Multiple recipients of list ORACLE-L Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. I've used them in the past, but only in very specific instances and certainly not for all tables/columns. Sometimes just 1 or 2, sometimes 15 or 20. And only in those cases where needed. Cherie Machler Oracle DBA Gelco Information Network -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Larry Elkins INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Nicoll, Iain (Calanais) INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
Larry, Thanks for taking the time to write this lengthy reply. I've never seen this particular information on histograms anywhere. This has been very informative. Cherie Machler Oracle DBA Gelco Information Network Larry Elkins elkinsl@flash To: [EMAIL PROTECTED] .netcc: Subject: RE: Statistical sampling and representative stats 05/29/02 05:57collection AM -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John Kanagaraj Sent: Tuesday, May 28, 2002 1:49 PM To: Multiple recipients of list ORACLE-L Subject: RE: Statistical sampling and representative stats collection Btw, there is an option in 9i DBMS_STATS.AUTO_SAMPLE_SIZE. Has anyone tried this out? Or know how it works? I would be interested in that as well. for these stats on non-indexed columns? If so, I disagree on this point, even if it is general advice and not a rule. Stats on non-indexed columns can play a *large* role in CBO decisions. I agree with you, although I do have to contend that the effect is not very pronounced in all databases. This was revealed in some depth in Wolfgang's paper at IOUG where he was able to actually determine this in a 10053 trace, and it was an eye opener for me. The issue I have with this is that the default number of buckets is 2 and that is nowhere near what is needed. On the other hand, indiscriminately increasing the bucket size would leave you with a _Large_ number of histograms and result in the 'row cache lock' latch being taken out more that it should have been (as well as add to the cost of parsing). I would only consider gathering this info, or customizing the stats gathering process in general, on a case by case basis and only when needed. Due to the nature of the data and queries, you might find the need to gather the non-indexed column stats on a single table and 1 column, or maybe 2 tables, or maybe many tables and columns, or maybe none at all. You already know, then, how stats on non-indexed columns can play a role. But since I received a few back-channel emails asking for examples of how stats on the non-indexed columns make a difference, I'll include the following extreme example that illustrates. But by no means am I saying you should always gather them, just on an as needed basis. And I haven't included the 10053 traces since that could get very tedious ;-) Say we have two tables, tables A and B, each consisting of two columns, A and B. Column A is the PK on each, column B is not indexed. So: Table A: 50,000 Rows Table B: 50,000 Rows Column A.A: (PK) Values 1 thru 50,000 Column B.A: (PK) Values 1 thru 50,000 Column A.B: Not indexed, values 1 thru 25000, each occurring twice ? e.g. 1,1,2,2,3,3... Column B.B: Not indexed, values 0 and 1, each occurring 25,000 times Note that I am not really even dealing with any skewness at this point ? this is an even distribution of values. Here are 4 sample queries: Sample Queries: Select * From A, B WHERE A.A = B.A and A.B = 5 ?- Filters down to 2 rows, I want to drive by A in an NL fashion Select * From A, B WHERE A.A = B.A and A.B = 5 ?- Filters down to 2 rows, I still want to drive by A in an NL fashion and B.B = 1 ?- Filters down to 25,000 rows if treated standalone Select * From A, B WHERE A.A = B.A and B.B = 1 ?- Filters down to 25,000 rows, I want FTS's and HJ. Select * From A, B WHERE A.A = B.A and B.B = 27000 ?- no rows meet this, I want to drive the query by table B. With a basic compute for table for all indexed columns, we get hash joins with all four, in the case of the first two, driving by table A, and the next two driving by table B. There were no stats on the non-indexed column. This isn't what I really wanted for all 4 queries. Now, if I simply do an analyze table compute statistics, in which case stats are gathered for both columns A and B, you get the default bucket and two rows for each column over in DBA_TAB_HISTOGRAMS. In the case of the first 2 queries, I get a nested loops driven by table A using an FTS and an index lookup into table B. This is what I
RE: Statistical sampling and representative stats collection
We have a Y/N (yes/no) processing flag column on a very large table. As an alternative to a bitmapped index, we changed the code to manipulate Y/null values because 0.1% of values are 'Y' and the others are null. The resulting index is quite small but we do rebuild it periodically. This has worked quite well for us. Steve Orr Bozeman, MT -Original Message- Sent: Wednesday, May 29, 2002 7:09 AM To: Multiple recipients of list ORACLE-L Have used them on a for all indexed columns basis and they make a massive difference on heavily skewed data - particularly a sort of waiting to be processed flag which only has about 5 distinct values but the ones we want to pick will make up only about 0.01%. Haven't used them on all columns as we don't often filter on non-indexed columns. Iain Nicoll -Original Message- Sent: Wednesday, May 29, 2002 12:59 PM To: Multiple recipients of list ORACLE-L Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. I've used them in the past, but only in very specific instances and certainly not for all tables/columns. Sometimes just 1 or 2, sometimes 15 or 20. And only in those cases where needed. Cherie Machler Oracle DBA Gelco Information Network -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Orr, Steve INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
I remember that Steve Orr of this list used Histograms and enjoyed huge increase in performance. Hi John, that over a year ago, your memory is too good. :-) Haven't worked with histograms lately but here are some before and after stats from that former histogram implementation: - Rows scanned per second went from 300,000 to 500; - consistent reads per second went from 75,000 to 500; - CPU usage went from 85-100% to 5%; - the physical I/O rate dropped significantly. When histograms help the difference can be dramatic. This was from a single query that was executed via dbms_jobs every minute. After ID'ing the problem query I went through a tuning exercise at length then found histograms to be the magic bullet that fixed everything. So, rather than the shotgun approach of calculating histograms on everything, I think the more surgical technique of finding the bottlenecks (possibly using the wait interface) still applies. Of course I'm preaching to the choir. :-) Steve Orr Bozeman, MT -Original Message- Sent: Tuesday, May 28, 2002 4:54 PM To: Multiple recipients of list ORACLE-L Importance: High Cherie, In my experience, histograms seem a bit hit or miss but in the cases where they've worked, the performance improvement has been good or even fantastic. In the cases where they haven't helped, I've simply removed them. As I observed before, histograms help only when literal predicates are used (until 9i). Steve Adams has this to say about Histograms: Quote Histograms enable the optimizer to more accurately estimate the selectivity of literal predicates against columns with skewed data distributions. This can help the optimizer to choose a better access path and possibly join order for certain queries than might otherwise be the case. However, redundant histograms on columns with uniformly distributed data, and unduly large histograms on columns with skewed data distributions just increase the CPU cost of parsing and waste space in the shared pool. Therefore histograms should only be created where they are beneficial, and should not be any larger than necessary. Please note that histograms on columns that are not indexed can nevertheless be beneficial because they inform the optimizer of the cardinality of their table for the query and can thus influence the join order. Unquote Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. I remember that Steve Orr of this list used Histograms and enjoyed huge increase in performance. The reason why the Industry hasn't used Histograms as much as it should have been used is due to a combination of lack of knowledge, FUD as well as just plain lethargy. On the other hand, overuse also has its downsides. Btw, has anyone tracked V$ROWCACHE which provides a fair idea of the row cache (or DD) portion of the Shared pool - the figures against dc_histogram_data and dc_histogram_defs may provide some clues about what's going on within John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Orr, Steve INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
One problem I have with histograms is having to calculate the proper number of buckets. Thanks to Steve Adams I have a program to do just that. I keep such information in the dbms_stats_table so it can be used by dbms_stats when the table is reanalyzed. The proper number of buckets may change over time, and it's also possible the frequency of distributions of the values for a column may change enough that histograms no longer help. It is however an onerous task to recalculate them. How often are people checking the frequency distribution and the bucket counts. I try to do this every three months. Ian MacGregor Stanford Linear Accelerator Center [EMAIL PROTECTED] -Original Message- Sent: Wednesday, May 29, 2002 7:39 AM To: Multiple recipients of list ORACLE-L I remember that Steve Orr of this list used Histograms and enjoyed huge increase in performance. Hi John, that over a year ago, your memory is too good. :-) Haven't worked with histograms lately but here are some before and after stats from that former histogram implementation: - Rows scanned per second went from 300,000 to 500; - consistent reads per second went from 75,000 to 500; - CPU usage went from 85-100% to 5%; - the physical I/O rate dropped significantly. When histograms help the difference can be dramatic. This was from a single query that was executed via dbms_jobs every minute. After ID'ing the problem query I went through a tuning exercise at length then found histograms to be the magic bullet that fixed everything. So, rather than the shotgun approach of calculating histograms on everything, I think the more surgical technique of finding the bottlenecks (possibly using the wait interface) still applies. Of course I'm preaching to the choir. :-) Steve Orr Bozeman, MT -Original Message- Sent: Tuesday, May 28, 2002 4:54 PM To: Multiple recipients of list ORACLE-L Importance: High Cherie, In my experience, histograms seem a bit hit or miss but in the cases where they've worked, the performance improvement has been good or even fantastic. In the cases where they haven't helped, I've simply removed them. As I observed before, histograms help only when literal predicates are used (until 9i). Steve Adams has this to say about Histograms: Quote Histograms enable the optimizer to more accurately estimate the selectivity of literal predicates against columns with skewed data distributions. This can help the optimizer to choose a better access path and possibly join order for certain queries than might otherwise be the case. However, redundant histograms on columns with uniformly distributed data, and unduly large histograms on columns with skewed data distributions just increase the CPU cost of parsing and waste space in the shared pool. Therefore histograms should only be created where they are beneficial, and should not be any larger than necessary. Please note that histograms on columns that are not indexed can nevertheless be beneficial because they inform the optimizer of the cardinality of their table for the query and can thus influence the join order. Unquote Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. I remember that Steve Orr of this list used Histograms and enjoyed huge increase in performance. The reason why the Industry hasn't used Histograms as much as it should have been used is due to a combination of lack of knowledge, FUD as well as just plain lethargy. On the other hand, overuse also has its downsides. Btw, has anyone tracked V$ROWCACHE which provides a fair idea of the row cache (or DD) portion of the Shared pool - the figures against dc_histogram_data and dc_histogram_defs may provide some clues about what's going on within John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Orr, Steve INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: MacGregor, Ian A. INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX:
RE: Statistical sampling and representative stats collection
Thanks Larry for this excellent and painstaking explanation. John -Original Message- From: Larry Elkins [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 29, 2002 4:59 AM To: Multiple recipients of list ORACLE-L Subject: RE: Statistical sampling and representative stats collection -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John Kanagaraj Sent: Tuesday, May 28, 2002 1:49 PM To: Multiple recipients of list ORACLE-L Subject: RE: Statistical sampling and representative stats collection Btw, there is an option in 9i DBMS_STATS.AUTO_SAMPLE_SIZE. Has anyone tried this out? Or know how it works? I would be interested in that as well. -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
Hi Larry, I sure wasn't disappointed to receive this note from a CBO/SQL guru such as yourself. Thanks for the ideas. My question here is would this database and your findings be applicable to other databases and the nature of their data? I wouldn't The nature of the data and the variances in between are so great I would never try and generalize everything. However, the number (or percentage) of generic cases is significant compared to the specific, and that is probably what I was trying to get to. In many cases, the act and cost of collecting the statistics by itself is so huge and significant compared to what is actually used out of all that data. since the characteristics can be so different, but, if I'm reading you correctly, you aren't saying that ESTIMATE is always the only way. But, the That is right. I think it was a case of moving on from carpet bombing (either ESTIMATE only or COMPUTE only) to some sort of precision bombing. There is still some collateral damage (as seen below) Sunday. Someone accidentally analyzed the schema at 30% on Monday and a lot of things went down the toilet. Going back to COMPUTE fixed things. Then again, maybe a 10% ESTIMATE would have fixed things. Jack and I both work with a guy who has talked about COMPUTE resulting in undesired plans, 10% did as well. They got the desired plans by going to 1%. So, even if one agrees that we don't necessarily have to COMPUTE, and in many (probably most?) cases we don't, there is still a lot of testing to be done to find the best estimate percent, and this could very well be different for various objects. And I think that's the battle we all face -- what is the best sampling percentage. And right now, it still seems to be done on a trial and error basis. I have some ideas on how one might attack this in an automated fashion, but it's still a *very* rough idea that I need to bounce off a few cohorts. I think you summarized it very well. How does one figure out what is the best percentage, given that we will NOT be able to complete a COMPUTE within the period allowed? When does one stop experimenting (significant DBA cost) and how does one make sure that the apple cart is not upset by new data patterns? I would be very interested in estimating the value using some automated fashion. I would be honored to be part of that bunch of cohorts! Btw, there is an option in 9i DBMS_STATS.AUTO_SAMPLE_SIZE. Has anyone tried this out? Or know how it works? for these stats on non-indexed columns? If so, I disagree on this point, even if it is general advice and not a rule. Stats on non-indexed columns can play a *large* role in CBO decisions. I'm not going to go into details and examples here illustrating that, but those stats can still help decide the driving table, the join methods between tables, etc. I built a sample case some time back to illustrate the importance of gathering these non-indexed column stats. Now, it might not be important for all systems, but if you are ever using indexed columns, and, still specifying criteria on non-indexed columns, the gathering of stats on the non-indexed columns could be *very* important. I can send you more details back-channel if you are interested. I agree with you, although I do have to contend that the effect is not very pronounced in all databases. This was revealed in some depth in Wolfgang's paper at IOUG where he was able to actually determine this in a 10053 trace, and it was an eye opener for me. The issue I have with this is that the default number of buckets is 2 and that is nowhere near what is needed. On the other hand, indiscriminately increasing the bucket size would leave you with a _Large_ number of histograms and result in the 'row cache lock' latch being taken out more that it should have been (as well as add to the cost of parsing). And your approach very well could take care of most cases for many people. It's an interesting idea and something certainly worth playing around with. Yep - and I did add a YMMV :) I would love to see this thread grow. John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from).
RE: Statistical sampling and representative stats collection
Ian, John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? To keep this simplistic, I wouldn't use Histograms (or let it default to 2) *unless* hardcoded values are known to be used, at least in 8i. The situation becomes different in 9i as the CBO is able to peek into these values even when bind variables are used. (I think there is a script out there on Steve Adam's site called 'Histogram Helper' which can suggest this for you). However, as Larry mentioned in a previous email, the CBO is influenced by distributions in non-indexed colummns. The issue here is that the number of buckets really matter, and the default of 2 can influence incorrect decisions (haven't we all seen 'em? :) So what I am essentially saying is this: Use COMPUTE and Histograms when you have to, but don't sweat over it unless it pinches ya. And how do we determine it is pinching? V$SYSSTAT is a pretty good indicator: (At the risk of being called a part of the 'ratios' group) Is the ratio of 'table scan blocks gotten' to 'table scan rows gotten' acceptable? Is the number of table scans acceptable? Is the number of 'db block gets' too much - as compared to 'physical reads'? I am in the process of determining the overheads of having 'too many' histograms - I am observing some 'row cache lock' latch waits and think that this could have been the result of too many histograms. Hope to post some info back to the list soon. John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
John, I know in a previous job, we determined that histograms where not worth it. The following is from a test that we performed: *** Table- F_tab Uniform DistributionMax Distribution Field- P_code 0.65% 18% Therefore, from the above numbers, the field should be a good candidate for histograms so I did the following tests. Based on the following combinations of statistics and histograms, I timed how fast a sample query ran: w/o stats w/ statsw/stats w/stats P_Code no histograms 100 buckets 50 buckets -- -- -- -- -- 0101342 secs. 428 385 500 0101406 416 326 340 0101391 390 327 359 6501458 490 337 342 6501475 380 358 490 6501518 395 326 354 -- - -- -- -- Total Secs. 1730162913482085 (w/o high and low values) Avg time7Min 12Sec 6Min 47Sec 5Min 37Sec 5Min 51Sec per run However, to create the histogram it takes 1hr42min. Too long for the benefit that we gain. *** Tom -Original Message- Sent: Tuesday, May 28, 2002 3:25 PM To: Multiple recipients of list ORACLE-L Ian, John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? To keep this simplistic, I wouldn't use Histograms (or let it default to 2) *unless* hardcoded values are known to be used, at least in 8i. The situation becomes different in 9i as the CBO is able to peek into these values even when bind variables are used. (I think there is a script out there on Steve Adam's site called 'Histogram Helper' which can suggest this for you). However, as Larry mentioned in a previous email, the CBO is influenced by distributions in non-indexed colummns. The issue here is that the number of buckets really matter, and the default of 2 can influence incorrect decisions (haven't we all seen 'em? :) So what I am essentially saying is this: Use COMPUTE and Histograms when you have to, but don't sweat over it unless it pinches ya. And how do we determine it is pinching? V$SYSSTAT is a pretty good indicator: (At the risk of being called a part of the 'ratios' group) Is the ratio of 'table scan blocks gotten' to 'table scan rows gotten' acceptable? Is the number of table scans acceptable? Is the number of 'db block gets' too much - as compared to 'physical reads'? I am in the process of determining the overheads of having 'too many' histograms - I am observing some 'row cache lock' latch waits and think that this could have been the result of too many histograms. Hope to post some info back to the list soon. John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Terrian, Tom INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
I have had some really good experiences with using histograms. They didn't always produce the improvements that I expected but in many cases, I saw 10 times, 100 times, or even 1,000 faster execution times after adding histograms. I don't have the specifics, but these were cases where the data was very heavily skewed and the column that the histogram was created on was included in the WHERE clause of the SELECT statement and set to some specified value with an equal sign (bind variables were not used). In my experience, histograms seem a bit hit or miss but in the cases where they've worked, the performance improvement has been good or even fantastic. In the cases where they haven't helped, I've simply removed them. Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. Cherie Machler Oracle DBA Gelco Information Network Terrian, Tom tterrian@daas To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] .dla.milcc: Sent by: Subject: RE: Statistical sampling and representative stats [EMAIL PROTECTED]collection om 05/28/02 02:55 PM Please respond to ORACLE-L John, I know in a previous job, we determined that histograms where not worth it. The following is from a test that we performed: *** Table- F_tab Uniform Distribution Max Distribution Field- P_code0.65% 18% Therefore, from the above numbers, the field should be a good candidate for histograms so I did the following tests. Based on the following combinations of statistics and histograms, I timed how fast a sample query ran: w/o statsw/ stats w/stats w/stats P_Code no histograms100 buckets 50 buckets -- -- -- -- -- 0101 342 secs.428 385 500 0101 406 416 326 340 0101 391 390 327 359 6501 458 490 337 342 6501 475 380 358 490 6501 518 395 326 354 -- --- -- -- Total Secs. 1730 1629 1348 2085 (w/o high and low values) Avg time7Min 12Sec 6Min 47Sec 5Min 37Sec5Min 51Sec per run However, to create the histogram it takes 1hr42min. Too long for the benefit that we gain. *** Tom -Original Message- Sent: Tuesday, May 28, 2002 3:25 PM To: Multiple recipients of list ORACLE-L Ian, John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? To keep this simplistic, I wouldn't use Histograms (or let it default to 2) *unless* hardcoded values are known to be used, at least in 8i. The situation becomes different in 9i as the CBO is able to peek into these values even when bind variables are used
RE: Statistical sampling and representative stats collection
tom, This is interesting. How did you determine max distribution and uniform distribution? Did you use standard deviation and variance? regards, jack silvey --- Terrian, Tom [EMAIL PROTECTED] wrote: John, I know in a previous job, we determined that histograms where not worth it. The following is from a test that we performed: *** Table-F_tab Uniform DistributionMax Distribution Field-P_code 0.65% 18% Therefore, from the above numbers, the field should be a good candidate for histograms so I did the following tests. Based on the following combinations of statistics and histograms, I timed how fast a sample query ran: w/o stats w/ statsw/stats w/stats P_Codeno histograms 100 buckets 50 buckets ---- -- -- -- 0101 342 secs. 428 385 500 0101 406 416 326 340 0101 391 390 327 359 6501 458 490 337 342 6501 475 380 358 490 6501 518 395 326 354 --- -- -- -- Total Secs. 1730162913482085 (w/o high and low values) Avg time 7Min 12Sec 6Min 47Sec 5Min 37Sec 5Min 51Sec per run However, to create the histogram it takes 1hr42min. Too long for the benefit that we gain. *** Tom -Original Message- Sent: Tuesday, May 28, 2002 3:25 PM To: Multiple recipients of list ORACLE-L Ian, John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? To keep this simplistic, I wouldn't use Histograms (or let it default to 2) *unless* hardcoded values are known to be used, at least in 8i. The situation becomes different in 9i as the CBO is able to peek into these values even when bind variables are used. (I think there is a script out there on Steve Adam's site called 'Histogram Helper' which can suggest this for you). However, as Larry mentioned in a previous email, the CBO is influenced by distributions in non-indexed colummns. The issue here is that the number of buckets really matter, and the default of 2 can influence incorrect decisions (haven't we all seen 'em? :) So what I am essentially saying is this: Use COMPUTE and Histograms when you have to, but don't sweat over it unless it pinches ya. And how do we determine it is pinching? V$SYSSTAT is a pretty good indicator: (At the risk of being called a part of the 'ratios' group) Is the ratio of 'table scan blocks gotten' to 'table scan rows gotten' acceptable? Is the number of table scans acceptable? Is the number of 'db block gets' too much - as compared to 'physical reads'? I am in the process of determining the overheads of having 'too many' histograms - I am observing some 'row cache lock' latch waits and think that this could have been the result of too many histograms. Hope to post some info back to the list soon. John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Terrian, Tom INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists
RE: Statistical sampling and representative stats collection
Cherie, In my experience, histograms seem a bit hit or miss but in the cases where they've worked, the performance improvement has been good or even fantastic. In the cases where they haven't helped, I've simply removed them. As I observed before, histograms help only when literal predicates are used (until 9i). Steve Adams has this to say about Histograms: Quote Histograms enable the optimizer to more accurately estimate the selectivity of literal predicates against columns with skewed data distributions. This can help the optimizer to choose a better access path and possibly join order for certain queries than might otherwise be the case. However, redundant histograms on columns with uniformly distributed data, and unduly large histograms on columns with skewed data distributions just increase the CPU cost of parsing and waste space in the shared pool. Therefore histograms should only be created where they are beneficial, and should not be any larger than necessary. Please note that histograms on columns that are not indexed can nevertheless be beneficial because they inform the optimizer of the cardinality of their table for the query and can thus influence the join order. Unquote Based on the scarcity of previous responses to emails on this list, it seems that histograms are not that widely used throughout the industry. I'm not sure why. I remember that Steve Orr of this list used Histograms and enjoyed huge increase in performance. The reason why the Industry hasn't used Histograms as much as it should have been used is due to a combination of lack of knowledge, FUD as well as just plain lethargy. On the other hand, overuse also has its downsides. Btw, has anyone tracked V$ROWCACHE which provides a fair idea of the row cache (or DD) portion of the Shared pool - the figures against dc_histogram_data and dc_histogram_defs may provide some clues about what's going on within John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Statistical sampling and representative stats collection
Fantastic research, John! - Original Message - To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Sent: Thursday, May 23, 2002 7:13 PM Jack, I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper and arrived at the following: (cut-and-paste of relevant parts of the paper) --- Begin Quote --- MYTH: COMPUTE IS BETTER THAN ESTIMATE This one generates an endless debate actually, so we will not take a firm stand either way. Rather, we will present some figures that throw some light on the issue and allow us to step back and look at the situation. The problem with COMPUTE is that it has to scan the entire table, sort it and figure out the exact data distribution. On the other hand, ESTIMATE steps through samples of the data, sorts and analyzes only a portion of the data. In a recent test for the effectiveness of COMPUTE versus ESTIMATE on a static clone of a reasonably large Oracle Apps database, the statistics were generated and stored for both COMPUTE and ESTIMATE. The Database consisted of about 3,300 tables and 6,000 indexes and occupied approximately 120 Gb. The ESTIMATE percentage was defaulted to 10% and no activity other than ANALYZE was allowed on this clone during the entire period. Table statistics including row count, average row length and blocks occupied were analyzed. This showed that there were some differences in row count and average row length on 321 of these tables. Row count differences ranged from a value of 53 row less in the ESTIMATE of a table containing 205,743 rows (0.025%) all the way up-to a count difference of 101,704 in 13,311,090 rows (0.76%). Even assuming a difference of a maximum of 5% in these scenarios, you are not far off the goal. Further analysis showed that a smaller average row length coupled with a small table produced larger variations than was usually seen. The differences however, were far more pronounced in Indexes - differences of upto 300% were noticed. Further analysis showed that this was related to the percentage of deleted leaf rows in the index. If this percentage is high, the possibility of ESTIMATE going wrong was also high, as the deletions are not factored in correctly. This was especially true if the deletions occurred in leaf blocks that were probably not involved in the ESTIMATE. When the deleted leaf rows was low or even nil within the index, the percentage difference was much lower, in the range of 4 to 5%. The real myth killer is the cost of COMPUTE versus ESTIMATE - COMPUTE required 66,553,308 reads versus 38,951,158 reads for ESTIMATE - almost 70% more reads for COMPUTE. The sorting involved in determining the averages and data distribution was a clincher - COMPUTE processed 4,263,724,259 rows in sorting operations while ESTIMATE sorted just 18,025,069 - i.e. about 235% more rows were sorted for the COMPUTE operation. The last nail in the coffin was the time taken to COMPUTE statistics - about 36 hours against the time to ESTIMATE of just 12 hours. While the figures speak for themselves, we will offer some general advice to the cautious: ESTIMATE on tables and COMPUTE on Indexes. Columns are analyzed by default, but serve no useful purpose other than showing data spread. Hence, you could ANALYZE only Tables and Indexed columns alone. An identified list of 'small' tables could also be COMPUTED rather than ANALYZED. This advice is given because ESTIMATE on a table comes close as far as row count goes, while COMPUTE on Indexes generates a more accurate picture of both data distribution as well as object size statistics. Testing the effectiveness of COMPUTE versus ANALYZE is simple and provides you with figures that you can use to decide the strategy for your situation. Before we move to the next topic, keep in mind that an ANALYZE/ESTIMATE with a sample size greater than or equal to 50% will result in COMPUTE. --- End Quote --- The problem is that this simple mathematical model looks only at object sizes and did not look at Column spread and sensitivity. However, I believe that the combination of ESTIMATE on Tables and COMPUTE on Indexes would catch most of it. As always, YMMV! John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -Original Message- From: Jack Silvey [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 21, 2002 2:44 PM To: Multiple recipients of list ORACLE-L Subject: Statistical sampling and representative stats collection Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling
RE: Statistical sampling and representative stats collection
John, Thanks for a very good explanation between COMPUTE and ESTIMATE. I am changing the scripts to use DBMS_STATS instead of ANALYZE on out 9012 database, but on 8i I'll keep this in mind. Thanks again Raj __ Rajendra Jamadagni MIS, ESPN Inc. Rajendra dot Jamadagni at ESPN dot com Any opinion expressed here is personal and doesn't reflect that of ESPN Inc. QOTD: Any clod can have facts, but having an opinion is an art! ***1 This e-mail message is confidential, intended only for the named recipient(s) above and may contain information that is privileged, attorney work product or exempt from disclosure under applicable law. If you have received this message in error, or are not the named recipient(s), please immediately notify ESPN at (860) 766-2000 and delete this e-mail message from your computer, Thank you. ***1
RE: Statistical sampling and representative stats collection
Hi Jack, One question - you mention that an index analyze provides beter data distribution. Could you discuss what you found in more detail? What I meant was that the Histograms that are created during an ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of the data distribution in such columns. Under _some_ circumstances, the CBO will be able to use this information to decide the best path (FTS or Indexed read). On the other hand, and simply stated, when bind variables are used in a cursor, this information about data distribution is not used since the value of the bind variable is not used during the parse prior to 9i. In other words, the access plan is built without considering the value of bind variables that would have otherwise influenced the plan when histograms (and thus information about data distribution) is avialable. However, 9i kinda rectifies this and I quote from the Fine Manual: (Oracle9i: Database Performance Guide and Reference) Cursor Sharing Enhancements The CBO now peeks at the values of user-defined bind variables on the first invocation of a cursor. This lets the optimizer determine the selectivity of any WHERE clause condition, as well as if literals had been used instead of bind variables. When bind variables are used in a statement, it is assumed that cursor sharing is intended and that different invocations are supposed to use the same execution plan. This gives us the best of both worlds (in some cases). Of course, it all depends on the number of buckets defined for Histograms and the width of the data spread (and that is why I emphasized _some_ and 'simply stated' above). I haven't tested this extensively, and I would appreciate any further inputs from the Gurus! (For the rest of us: A 10053 trace should show up what's happening. There was a _great_ presentation from Wolfgang Breitling on this topic at the recent IOUG). Btw: Searching for 'bucket' in the 8i SQL reference came up with the NTILE function (new in 8i), and I said Wow! because I was looking for such a function. Goes to say that we need to read the fine manuals more than we normally do! John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
John, Great work. I do have some comments and questions, but don't take it as being critical. Your work on this is greatly appreciated. My comments/question are in-line and only asked so that the list can bounce around some more ideas and thoughts. --- Begin Quote --- MYTH: COMPUTE IS BETTER THAN ESTIMATE This one generates an endless debate actually, so we will not take a firm stand either way. Rather, we will present some figures that throw some light on the issue and allow us to step back and look at the situation. The problem with COMPUTE is that it has to scan the entire table, sort it and figure out the exact data distribution. On the other hand, ESTIMATE steps through samples of the data, sorts and analyzes only a portion of the data. True, if we can get good stats that result in effective plans by using ESTIMATE, by all means go that route since a COMPUTE is much more expensive. In a recent test for the effectiveness of COMPUTE versus ESTIMATE on a static clone of a reasonably large Oracle Apps database, the statistics were generated and stored for both COMPUTE and ESTIMATE. The Database consisted of about 3,300 tables and 6,000 indexes and occupied approximately 120 Gb. My question here is would this database and your findings be applicable to other databases and the nature of their data? I wouldn't think it would be since the characteristics can be so different, but, if I'm reading you correctly, you aren't saying that ESTIMATE is always the only way. But, the title Myth: Compute is better than Estimate. Well, we have all seen cases where compute worked out better (or 10% or 1% might have worked better than the 30% specified). So maybe it's just a minor quibble (no Jack Silvey, not a Tribble) and if the title included ...is ALWAYS better..., then I would totally agree with it being a myth. We have all been there, and I worked with a group recently who had the luxury of doing full computes every Sunday. Someone accidentally analyzed the schema at 30% on Monday and a lot of things went down the toilet. Going back to COMPUTE fixed things. Then again, maybe a 10% ESTIMATE would have fixed things. Jack and I both work with a guy who has talked about COMPUTE resulting in undesired plans, 10% did as well. They got the desired plans by going to 1%. So, even if one agrees that we don't necessarily have to COMPUTE, and in many (probably most?) cases we don't, there is still a lot of testing to be done to find the best estimate percent, and this could very well be different for various objects. And I think that's the battle we all face -- what is the best sampling percentage. And right now, it still seems to be done on a trial and error basis. I have some ideas on how one might attack this in an automated fashion, but it's still a *very* rough idea that I need to bounce off a few cohorts. SNIP While the figures speak for themselves, we will offer some general advice to the cautious: ESTIMATE on tables and COMPUTE on Indexes. Columns are analyzed by default, but serve no useful purpose other than showing data spread. Hence, you could ANALYZE only Tables and Indexed columns alone. If I read this correctly, are you saying we only need to gather table stats and the column stats for indexed columns, and that there is no practical use for these stats on non-indexed columns? If so, I disagree on this point, even if it is general advice and not a rule. Stats on non-indexed columns can play a *large* role in CBO decisions. I'm not going to go into details and examples here illustrating that, but those stats can still help decide the driving table, the join methods between tables, etc. I built a sample case some time back to illustrate the importance of gathering these non-indexed column stats. Now, it might not be important for all systems, but if you are ever using indexed columns, and, still specifying criteria on non-indexed columns, the gathering of stats on the non-indexed columns could be *very* important. I can send you more details back-channel if you are interested. An identified list of 'small' tables could also be COMPUTED rather than ANALYZED. This advice is given because ESTIMATE on a table comes close as far as row count goes, while COMPUTE on Indexes generates a more accurate picture of both data distribution as well as object size statistics. Testing the effectiveness of COMPUTE versus ANALYZE is simple and provides you with figures that you can use to decide the strategy for your situation. Ok, it sounds like you aren't saying a one size fits all. Before we move to the next topic, keep in mind that an ANALYZE/ESTIMATE with a sample size greater than or equal to 50% will result in COMPUTE. --- End Quote --- The problem is that this simple mathematical model looks only at object sizes and did not look at Column spread and sensitivity. However, I believe that the combination of ESTIMATE on Tables and COMPUTE on Indexes would catch most of it. As
RE: Statistical sampling and representative stats collection
John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? Ian MacGregor Stanford Linear Accelerator Center [EMAIL PROTECTED] -Original Message- Sent: Friday, May 24, 2002 3:01 PM To: Multiple recipients of list ORACLE-L Hi Jack, One question - you mention that an index analyze provides beter data distribution. Could you discuss what you found in more detail? What I meant was that the Histograms that are created during an ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of the data distribution in such columns. Under _some_ circumstances, the CBO will be able to use this information to decide the best path (FTS or Indexed read). On the other hand, and simply stated, when bind variables are used in a cursor, this information about data distribution is not used since the value of the bind variable is not used during the parse prior to 9i. In other words, the access plan is built without considering the value of bind variables that would have otherwise influenced the plan when histograms (and thus information about data distribution) is avialable. However, 9i kinda rectifies this and I quote from the Fine Manual: (Oracle9i: Database Performance Guide and Reference) Cursor Sharing Enhancements The CBO now peeks at the values of user-defined bind variables on the first invocation of a cursor. This lets the optimizer determine the selectivity of any WHERE clause condition, as well as if literals had been used instead of bind variables. When bind variables are used in a statement, it is assumed that cursor sharing is intended and that different invocations are supposed to use the same execution plan. This gives us the best of both worlds (in some cases). Of course, it all depends on the number of buckets defined for Histograms and the width of the data spread (and that is why I emphasized _some_ and 'simply stated' above). I haven't tested this extensively, and I would appreciate any further inputs from the Gurus! (For the rest of us: A 10053 trace should show up what's happening. There was a _great_ presentation from Wolfgang Breitling on this topic at the recent IOUG). Btw: Searching for 'bucket' in the 8i SQL reference came up with the NTILE function (new in 8i), and I said Wow! because I was looking for such a function. Goes to say that we need to read the fine manuals more than we normally do! John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: John Kanagaraj INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: MacGregor, Ian A. INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
Hi Jack, One question - you mention that an index analyze provides beter data distribution. Could you discuss what you found in more detail? What I meant was that the Histograms that are created during an ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of the data distribution in such columns. Under _some_ circumstances, the CBO will be able to use this information to decide the best path (FTS or Indexed read). And stats on the non-indexed columns can also play a large role in deciding driving table order and join methods. Ok, touched on that in an earlier email ;-) On the other hand, and simply stated, when bind variables are used in a cursor, this information about data distribution is not used since the value of the bind variable is not used during the parse prior to 9i. In my case, and Jack's (I'm now doing some work with a DB where Jack is dealing with the analyze strategies), the bind thing isn't an issue. Everything is ad-hoc, and, literals *are* used. But, there really isn't much of an opportunity for sharing SQL even if binds were used. One user might specify 5 values for one column, 3 values for another, 2 values for five other columns. The combinations of the criteria specified, and the number of values specified for each of those columns, not to mention the tables specified, very few, if any, of the SQL statements could be shared even if using binds. Plus, in this case, with histograms being very valuable, one could live with less cursor sharing even if there were some that could be shared when using binds. In this case, the literals are needed and their use is not causing any shared pool or library cache contention. Btw: Searching for 'bucket' in the 8i SQL reference came up with the NTILE function (new in 8i), and I said Wow! because I was looking for such a function. Goes to say that we need to read the fine manuals more than we normally do! The analytic functions are great. The analytic functions first came about in 8.1.6, a few more functions added in 8.1.7, and taken even further in 9i. A lot of the traditional ways we might have done things, often times including self joins, or, procedural code, are thrown out the window. I've found all kinds of uses for them that (1) improve performance over the old approaches, and (2) are simpler to understand. Then again, some of the analytic function examples leave my head spinning. I'm still working through a lot of them for better understanding. But yeah, analytic functions like NTILE are very, very nice. John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Larry Elkins INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
Well, if the temp had any brains, the job should take less than an hour. Maybe your temp employee was hoping for a promotion to damagement, in which case the job would take days. Jared Jamadagni, Rajendra [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 05/22/2002 08:53 AM Please respond to ORACLE-L To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: Subject:RE: Statistical sampling and representative stats collection I remember, in 96/97, I was working on a Oracle 7 database and none of the this sample() was available. My task was to select x people out of z number of people randomly. We hired a temp, who took Dallas/Fort Worth dictionary (I was working in that area) and counted (yes counted) number of people whose last names started with A and B and so on. We put this information into a table and then found out percentage of last names starting with A compared to the whole list for each letter. This percentage was used as weightage for each letter when choosing random samples from a list of people. Then I used this weightage and dbms_random to pick the required number of names from our database. It was fun to implement, but I feel bad for the temp who had to count the names. So, compared to that, 8i and 9i built-in features rock !! Raj __ Rajendra Jamadagni MIS, ESPN Inc. Rajendra dot Jamadagni at ESPN dot com Any opinion expressed here is personal and doesn't reflect that of ESPN Inc. QOTD: Any clod can have facts, but having an opinion is an art! ***1 This e-mail message is confidential, intended only for the named recipient(s) above and may contain information that is privileged, attorney work product or exempt from disclosure under applicable law. If you have received this message in error, or are not the named recipient(s), please immediately notify ESPN at (860) 766-2000 and delete this e-mail message from your computer, Thank you. ***1
RE: Statistical sampling and representative stats collection
Jack, I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper and arrived at the following: (cut-and-paste of relevant parts of the paper) --- Begin Quote --- MYTH: COMPUTE IS BETTER THAN ESTIMATE This one generates an endless debate actually, so we will not take a firm stand either way. Rather, we will present some figures that throw some light on the issue and allow us to step back and look at the situation. The problem with COMPUTE is that it has to scan the entire table, sort it and figure out the exact data distribution. On the other hand, ESTIMATE steps through samples of the data, sorts and analyzes only a portion of the data. In a recent test for the effectiveness of COMPUTE versus ESTIMATE on a static clone of a reasonably large Oracle Apps database, the statistics were generated and stored for both COMPUTE and ESTIMATE. The Database consisted of about 3,300 tables and 6,000 indexes and occupied approximately 120 Gb. The ESTIMATE percentage was defaulted to 10% and no activity other than ANALYZE was allowed on this clone during the entire period. Table statistics including row count, average row length and blocks occupied were analyzed. This showed that there were some differences in row count and average row length on 321 of these tables. Row count differences ranged from a value of 53 row less in the ESTIMATE of a table containing 205,743 rows (0.025%) all the way up-to a count difference of 101,704 in 13,311,090 rows (0.76%). Even assuming a difference of a maximum of 5% in these scenarios, you are not far off the goal. Further analysis showed that a smaller average row length coupled with a small table produced larger variations than was usually seen. The differences however, were far more pronounced in Indexes - differences of upto 300% were noticed. Further analysis showed that this was related to the percentage of deleted leaf rows in the index. If this percentage is high, the possibility of ESTIMATE going wrong was also high, as the deletions are not factored in correctly. This was especially true if the deletions occurred in leaf blocks that were probably not involved in the ESTIMATE. When the deleted leaf rows was low or even nil within the index, the percentage difference was much lower, in the range of 4 to 5%. The real myth killer is the cost of COMPUTE versus ESTIMATE - COMPUTE required 66,553,308 reads versus 38,951,158 reads for ESTIMATE - almost 70% more reads for COMPUTE. The sorting involved in determining the averages and data distribution was a clincher - COMPUTE processed 4,263,724,259 rows in sorting operations while ESTIMATE sorted just 18,025,069 - i.e. about 235% more rows were sorted for the COMPUTE operation. The last nail in the coffin was the time taken to COMPUTE statistics - about 36 hours against the time to ESTIMATE of just 12 hours. While the figures speak for themselves, we will offer some general advice to the cautious: ESTIMATE on tables and COMPUTE on Indexes. Columns are analyzed by default, but serve no useful purpose other than showing data spread. Hence, you could ANALYZE only Tables and Indexed columns alone. An identified list of 'small' tables could also be COMPUTED rather than ANALYZED. This advice is given because ESTIMATE on a table comes close as far as row count goes, while COMPUTE on Indexes generates a more accurate picture of both data distribution as well as object size statistics. Testing the effectiveness of COMPUTE versus ANALYZE is simple and provides you with figures that you can use to decide the strategy for your situation. Before we move to the next topic, keep in mind that an ANALYZE/ESTIMATE with a sample size greater than or equal to 50% will result in COMPUTE. --- End Quote --- The problem is that this simple mathematical model looks only at object sizes and did not look at Column spread and sensitivity. However, I believe that the combination of ESTIMATE on Tables and COMPUTE on Indexes would catch most of it. As always, YMMV! John Kanagaraj Oracle Applications DBA DBSoft Inc (W): 408-970-7002 The manuals for Oracle are here: http://tahiti.oracle.com The manual for Life is here: http://www.gospelcom.net ** The opinions and statements above are entirely my own and not those of my employer or clients ** -Original Message- From: Jack Silvey [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 21, 2002 2:44 PM To: Multiple recipients of list ORACLE-L Subject: Statistical sampling and representative stats collection Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing
RE: Statistical sampling and representative stats collection
and statements above are entirely my own and not those of my employer or clients ** -Original Message- From: Jack Silvey [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 21, 2002 2:44 PM To: Multiple recipients of list ORACLE-L Subject: Statistical sampling and representative stats collection Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your === message truncated === __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Statistical sampling and representative stats collection
Jack, What version are you on? Are you able to utilize the gather_stale option. That way you would not only be optimizing the amount you are estimating but the interval between analyzing. Of course, that option of DBMS_STATS is not available on older versions. Cherie Machler Oracle DBA Gelco Information Network Jack Silvey jack_silvey@y To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] ahoo.comcc: Sent by: Subject: Re: Statistical sampling and representative stats [EMAIL PROTECTED]collection om 05/21/02 06:19 PM Please respond to ORACLE-L Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists
Re: Statistical sampling and representative stats collection
Jack, Isn't the problem with this concept that it doesn't take into consideration how skewed the data is? Statistically significance would be relevant to perfectly distributed data but wouldn't you need a higher percentage of data for significance in more highly skewed data? Just something to consider. Cherie Machler Oracle DBA Gelco Information Network Jack Silvey jack_silvey@y To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] ahoo.comcc: Sent by: Subject: Re: Statistical sampling and representative stats [EMAIL PROTECTED]collection om 05/21/02 06:19 PM Please respond to ORACLE-L Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet
RE: Statistical sampling and representative stats collection
Jack, Nielsen Ratings (the TV Rating company) monitors about 5000 people (and their TV watching habits) to supply ratings for all the shows on most of the networks for the whole United States. So, as long as you have a working and proven statistical model, and a good sample, it works. How do I know, ever seen anyone challenging Nielsen Ratings for a show? I haven't. Raj __ Rajendra Jamadagni MIS, ESPN Inc. Rajendra dot Jamadagni at ESPN dot com Any opinion expressed here is personal and doesn't reflect that of ESPN Inc. QOTD: Any clod can have facts, but having an opinion is an art! ***1 This e-mail message is confidential, intended only for the named recipient(s) above and may contain information that is privileged, attorney work product or exempt from disclosure under applicable law. If you have received this message in error, or are not the named recipient(s), please immediately notify ESPN at (860) 766-2000 and delete this e-mail message from your computer, Thank you. ***1
RE: Statistical sampling and representative stats collection
Jack, Raj I agree. The main point I feel is that if you follow statistics theory, which a good part of our modern technology relies on, you will test a fixed number of samples, rather than a percentage of the table rows. For a small table, you may have to sample the entire table to get results that work. As a real-life example, you wouldn't sample 10 US Senators and expect those results to be accurate. No, you would simply survey each Senator (and hope they don't change their mind). Similarly, don't just sample 30 percent of a 1,000-row table. Sample all 1,000 rows. For a large table, sampling a percentage would oversample and be a wasted effort. If you have a million-row table and a hundred-million-row table, the same sample size will produce results nearly as accurate for both. That is why you don't see nearly as many state political polls. It is nearly as much expense to accurately sample the citizens in a state as it is to sample all the citizens in the US Someone asked about skewed data. Well, that is the reason you perform a RANDOM sample. That is the key point, and what produces many real-life statistical failures. A classic example is the Truman-Dewey presidential race in 1948. The pioneer pollsters used random samples of phone numbers and confidently predicted Dewey's victory. What they neglected was that wealthier people had telephones in greater proportion than poor people. So their sample was skewed, which produced bad results. Here, we're betting on Oracle's statement that the sample is truly random. Now, if you want a more accurate result, you will sample more. But you aren't increasing the sample size because the table is larger, but to increase the accuracy. And to compensate for any other inaccuracies. Just a thought, if you're responsible for a data warehouse, you may want to consider studying some basic statistics. Unfortunately most computer science curriculums don't require a class in statistics. In fact, since polls form a lot of our political discussion, it wouldn't hurt to require all citizens to have some statistical training. It might make it harder for politicians to mis-construe statistical results. However, it is hard enough to get people just to vote, so I suppose that one isn't going to fly. Dennis Williams DBA Lifetouch, Inc. [EMAIL PROTECTED] -Original Message- Sent: Wednesday, May 22, 2002 9:39 AM To: Multiple recipients of list ORACLE-L Jack, Nielsen Ratings (the TV Rating company) monitors about 5000 people (and their TV watching habits) to supply ratings for all the shows on most of the networks for the whole United States. So, as long as you have a working and proven statistical model, and a good sample, it works. How do I know, ever seen anyone challenging Nielsen Ratings for a show? I haven't. Raj __ Rajendra Jamadagni MIS, ESPN Inc. Rajendra dot Jamadagni at ESPN dot com Any opinion expressed here is personal and doesn't reflect that of ESPN Inc. QOTD: Any clod can have facts, but having an opinion is an art! -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: DENNIS WILLIAMS INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Statistical sampling and representative stats collection
Cherie, Had some problems with DBMS_STATS, but we are making our way towards it. Just have to be careful, since it can do quirky things like analyze the dictionary. We are using dba_tab_modifications to monitor our tables. Thanks for the input. Jack --- [EMAIL PROTECTED] wrote: Jack, What version are you on? Are you able to utilize the gather_stale option. That way you would not only be optimizing the amount you are estimating but the interval between analyzing. Of course, that option of DBMS_STATS is not available on older versions. Cherie Machler Oracle DBA Gelco Information Network Jack Silvey jack_silvey@y To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] ahoo.comcc: Sent by: Subject: Re: Statistical sampling and representative stats [EMAIL PROTECTED]collection om 05/21/02 06:19 PM Please respond to ORACLE-L Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler
RE: Statistical sampling and representative stats collection
I remember, in 96/97, I was working on a Oracle 7 database and none of the this sample() was available. My task was to select x people out of z number of people randomly. We hired a temp, who took Dallas/Fort Worth dictionary (I was working in that area) and counted (yes counted) number of people whose last names started with A and B and so on. We put this information into a table and then found out percentage of last names starting with A compared to the whole list for each letter. This percentage was used as weightage for each letter when choosing random samples from a list of people. Then I used this weightage and dbms_random to pick the required number of names from our database. It was fun to implement, but I feel bad for the temp who had to count the names. So, compared to that, 8i and 9i built-in features rock !! Raj __ Rajendra Jamadagni MIS, ESPN Inc. Rajendra dot Jamadagni at ESPN dot com Any opinion expressed here is personal and doesn't reflect that of ESPN Inc. QOTD: Any clod can have facts, but having an opinion is an art! ***1 This e-mail message is confidential, intended only for the named recipient(s) above and may contain information that is privileged, attorney work product or exempt from disclosure under applicable law. If you have received this message in error, or are not the named recipient(s), please immediately notify ESPN at (860) 766-2000 and delete this e-mail message from your computer, Thank you. ***1
Re: Statistical sampling and representative stats collection
That is a very good question. Anyone else know the answer? My approach to skewness right now is to compare the mean and the median of the entire dataset, since if you have two measures of centrality that are not the same, you have skewness. /jack --- [EMAIL PROTECTED] wrote: Jack, Isn't the problem with this concept that it doesn't take into consideration how skewed the data is? Statistically significance would be relevant to perfectly distributed data but wouldn't you need a higher percentage of data for significance in more highly skewed data? Just something to consider. Cherie Machler Oracle DBA Gelco Information Network Jack Silvey jack_silvey@y To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] ahoo.comcc: Sent by: Subject: Re: Statistical sampling and representative stats [EMAIL PROTECTED]collection om 05/21/02 06:19 PM Please respond to ORACLE-L Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start
Re: Statistical sampling and representative stats collection
Jack You are right keeping in mind your application and variation of data in different tables. I just mentioned 30% as most agreed percentage for estimate. I think Dennis gave you a more pracitcial answer yesterday. Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 15:19:31 -0800 Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). MOHAMMAD RAFIQ _ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Mohammad Rafiq INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858
RE: Statistical sampling and representative stats collection
Raj, Would the temp have gotten the same result by counting the frequencies of the letters in the game of scrabble? Chaim Jamadagni, Rajendra [EMAIL PROTECTED]@fatcity.com on 05/22/2002 11:53:22 AM Please respond to [EMAIL PROTECTED] Sent by:[EMAIL PROTECTED] To:Multiple recipients of list ORACLE-L [EMAIL PROTECTED] cc: I remember, in 96/97, I was working on a Oracle 7 database and none of the this sample() was available. My task was to select x people out of z number of people randomly. We hired a temp, who took Dallas/Fort Worth dictionary (I was working in that area) and counted (yes counted) number of people whose last names started with A and B and so on. We put this information into a table and then found out percentage of last names starting with A compared to the whole list for each letter. This percentage was used as weightage for each letter when choosing random samples from a list of people. Then I used this weightage and dbms_random to pick the required number of names from our database. It was fun to implement, but I feel bad for the temp who had to count the names. So, compared to that, 8i and 9i built-in features rock !! Raj __ Rajendra JamadagniMIS, ESPN Inc. Rajendra dot Jamadagni at ESPN dot com Any opinion expressed here is personal and doesn't reflect that of ESPN Inc. QOTD: Any clod can have facts, but having an opinion is an art! (See attached file: ESPN_Disclaimer.txt) ESPN_Disclaimer.txt Description: Binary data
RE: Statistical sampling and representative stats collection
We use dbms_stats.gather_schema_stats and do not pass it sys as a schema owner. Under 8i this method did not work for gather empty, but did fine with gather stale. This has been fixed in 9i. However, it still abends if one of the schema owns an iot_overflow table. Ian -Original Message- Sent: Wednesday, May 22, 2002 8:29 AM To: Multiple recipients of list ORACLE-L Cherie, Had some problems with DBMS_STATS, but we are making our way towards it. Just have to be careful, since it can do quirky things like analyze the dictionary. We are using dba_tab_modifications to monitor our tables. Thanks for the input. Jack --- [EMAIL PROTECTED] wrote: Jack, What version are you on? Are you able to utilize the gather_stale option. That way you would not only be optimizing the amount you are estimating but the interval between analyzing. Of course, that option of DBMS_STATS is not available on older versions. Cherie Machler Oracle DBA Gelco Information Network Jack Silvey jack_silvey@y To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] ahoo.comcc: Sent by: Subject: Re: Statistical sampling and representative stats [EMAIL PROTECTED]collection om 05/21/02 06:19 PM Please respond to ORACLE-L Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative
Statistical sampling and representative stats collection
Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Statistical sampling and representative stats collection
The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). MOHAMMAD RAFIQ _ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Mohammad Rafiq INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: Statistical sampling and representative stats collection
Jack - I believe that you are on the right track. If you think about it, political pollsters randomly assess national populations with only a few thousand samples, and it works pretty good as long as it isn't biased by bad questions, lying respondents, etc. And you don't have those issues here. A more important issue if you wish to reduce the system resources devoted to analyzing tables would be to consider how volatile the table is. Some tables rarely change, while others change frequently. Then another issue is the overall objective you are trying to achieve. Is it to ensure query optimization? Is it a large population of ad-hoc queries? You may want to consider some of the new stored execution plans for queries that you don't want to suddenly start making some bad optimizer decisions. Dennis Williams DBA Lifetouch, Inc. [EMAIL PROTECTED] -Original Message- Sent: Tuesday, May 21, 2002 4:44 PM To: Multiple recipients of list ORACLE-L Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: DENNIS WILLIAMS INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: Statistical sampling and representative stats collection
Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have millions of rows each, seems like we could go 1% on our analyze and it would be within acceptable tolerances. /jack --- Mohammad Rafiq [EMAIL PROTECTED] wrote: The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED] Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one. I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random. The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go. Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it. One standard way to calculate margin of error for a given sample is by using this formula: M = 1/SQRT(N) where: M = margin of error N=sample size So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows. Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats. This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing. Input? Input? Buhler? Buhler? /jack silvey __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). MOHAMMAD RAFIQ _ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Mohammad Rafiq INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack Silvey INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Collection
Hi Friends : does anyone use the resource collection in 8i. wich advantages we have using it ? Are there problems using it ? Are there performance problemns ? Spaces ? Regards Eriovaldo [EMAIL PROTECTED] Limeira / Brazil _ Seja avisado de novas mensagens do Hotmail e use o comunique-se com seus amigos com o MSN Messenger em http://messenger.msn.com.br -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Eca Eca INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: COLLECTION ITERATOR PICKLER FETCH
Hi Scott, Pickling is serializing arbitrary object-oriented data structures. That is, converting them into a byte stream for storage, transmission over a network or iterative navigation as in this case. It is a more complex process than you might imagine. If you want more information, search the web for information on Python's pickle module. @ Regards, @ Steve Adams @ http://www.ixora.com.au/ @ http://www.christianity.net.au/ -Original Message- Sent: Friday, 11 May 2001 3:11 To: Multiple recipients of list ORACLE-L Sometimes I really love the developers at Oracle.. This should be good fodder for Eric P. Does anyone have experience with infamous pickler Fetch which shows up in this explain plan the select is on index organized nested tables (Nested objects)? It's obvious that someone in internals has a good sense of humor. I guess this is the way Oracle dereferences the embedded collections? SELECT STATEMENT SORT GROUP BY NESTED LOOPS NESTED LOOPS NESTED LOOPS NESTED LOOPS ` BLAH BLAH BLAH COLLECTION ITERATOR PICKLER FETCH VIEW AD_AGG_DATA_MASTER UNION-ALL PARTITION FILTER TABLE ACCESS BY INDEX ROWID AD_23 INDEX RANGE SCAN AD_23_IDX_1 FILTER TABLE ACCESS BY INDEX ROWID AD_24 INDEX RANGE SCAN AD_24_IDX_1 FILTER TABLE ACCESS BY INDEX ROWID AD_25 INDEX RANGE SCAN AD_25_IDX_1 Scott Crabtree Veritas Certified Survivor -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Scott Crabtree INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Steve Adams INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
COLLECTION ITERATOR PICKLER FETCH
Sometimes I really love the developers at Oracle.. This should be good fodder for Eric P. Does anyone have experience with infamous pickler Fetch which shows up in this explain plan the select is on index organized nested tables (Nested objects)? It's obvious that someone in internals has a good sense of humor. I guess this is the way Oracle dereferences the embedded collections? SELECT STATEMENT SORT GROUP BY NESTED LOOPS NESTED LOOPS NESTED LOOPS NESTED LOOPS ` BLAH BLAH BLAH COLLECTION ITERATOR PICKLER FETCH VIEW AD_AGG_DATA_MASTER UNION-ALL PARTITION FILTER TABLE ACCESS BY INDEX ROWID AD_23 INDEX RANGE SCAN AD_23_IDX_1 FILTER TABLE ACCESS BY INDEX ROWID AD_24 INDEX RANGE SCAN AD_24_IDX_1 FILTER TABLE ACCESS BY INDEX ROWID AD_25 INDEX RANGE SCAN AD_25_IDX_1 Scott Crabtree Veritas Certified Survivor -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Scott Crabtree INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
friday fun early, vote for favorites / Re: COLLECTION ITERATOR PICKLER FETCH
dudes/dudettes, i'm wy to brain dead after read about oracle9i self tuning, so i'll take refuge in anagram fun: http://www.wordsmith.org/anagram/ select output for: iterator pickler fetch (it wouldnt take more than 25 chars) A CRECHE FRITTER IT POLK ... A CRECHE FELT IRK PIT ROT ... A CRECHE FELT IRK IT PORT ... A CRECHE KITE FLIRT PORT ... A CRECHE TRIPLET FORK IT ... A CRECHE TRIPE FORK TILT ... A CRECHE PERK IF TILT ROT ... A CRECHE KEPT FLIRT RIOT ... A CRECHE TREK FLIP IT ROT ... A CRECHE PETROL RIFT KIT ... A ELECTRIC HEFT IRK PORT ... A ELECTRIC FRET HIT PORK ... A ELECTRIC PET FROTH IRK ... A COERCE TREK FILTH TRIP ... A CLICHE FRET REPORT KIT ... A CLICHE PERK RETORT FIT ... A RICOCHET FRITTER KELP ... A HECTIC REEK FLIRT PORT ... A HECTIC FILTER PERK ROT ... A HECTIC FELT REPORT IRK ... A CORRECT ETHER FLIP KIT ... On 10 May 2001, at 9:11, Scott Crabtree scribbled with alacrity and cogency: Sometimes I really love the developers at Oracle.. This should be good fodder for Eric P. Does anyone have experience with infamous pickler Fetch which shows up in this explain plan ... -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Eric D. Pierce INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: friday fun early, vote for favorites / Re: COLLECTION ITERATO
Yea, and oracle support yields A corrupt slope A cursor topple A cost/rule prop Oracle port pus CPA result: poor Prolate Corpus Poor trace plus Real poop crust Cool purse trap || -Original Message- || From: Eric D. Pierce [mailto:[EMAIL PROTECTED]] || Sent: Thursday, May 10, 2001 4:26 PM || To: Multiple recipients of list ORACLE-L || Subject: friday fun early, vote for favorites / Re: || COLLECTION ITERATOR || PICKLER FETCH || || || dudes/dudettes, || || i'm wy to brain dead after read about oracle9i self || tuning, so i'll take refuge in anagram fun: || ||http://www.wordsmith.org/anagram/ || || select output for: iterator pickler fetch || (it wouldnt take more than 25 chars) || || A CRECHE FRITTER IT POLK || ... || A CRECHE FELT IRK PIT ROT || ... || A CRECHE FELT IRK IT PORT || ... || A CRECHE KITE FLIRT PORT || ... || A CRECHE TRIPLET FORK IT || ... || A CRECHE TRIPE FORK TILT || ... || A CRECHE PERK IF TILT ROT || ... || A CRECHE KEPT FLIRT RIOT || ... || A CRECHE TREK FLIP IT ROT || ... || A CRECHE PETROL RIFT KIT || ... || A ELECTRIC HEFT IRK PORT || ... || A ELECTRIC FRET HIT PORK || ... || A ELECTRIC PET FROTH IRK || ... || A COERCE TREK FILTH TRIP || ... || A CLICHE FRET REPORT KIT || ... || A CLICHE PERK RETORT FIT || ... || A RICOCHET FRITTER KELP || ... || A HECTIC REEK FLIRT PORT || ... || A HECTIC FILTER PERK ROT || ... || A HECTIC FELT REPORT IRK || ... || A CORRECT ETHER FLIP KIT || ... || || || || || || || || On 10 May 2001, at 9:11, Scott Crabtree scribbled with || alacrity and cogency: || || || Sometimes I really love the developers at Oracle.. || This should be good || fodder for Eric P. || || Does anyone have experience with infamous pickler Fetch || which shows up in || this explain plan || || ... || || || -- || Please see the official ORACLE-L FAQ: http://www.orafaq.com || -- || Author: Eric D. Pierce || INET: [EMAIL PROTECTED] || || Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 || San Diego, California-- Public Internet access / || Mailing Lists || || To REMOVE yourself from this mailing list, send an E-Mail message || to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in || the message BODY, include a line containing: UNSUB ORACLE-L || (or the name of mailing list you want to be removed from). You may || also send the HELP command for other information (like subscribing). || -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Mohan, Ross INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
Re: friday fun early, vote for favorites / Re: COLLECTION ITERATOR PICKLER
Eric, I'm torn between 2: A RICOCHET FRITTER KELP -- nothing like kelp fritters! yummy! and A CORRECT ETHER FLIP KIT -- what Oracle needs to install to get Metalink to work properly That's my 2 pesos for today :-) John Dailey ING Corporate Systems DBA Atlanta, GA -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: INET: [EMAIL PROTECTED] Fat City Network Services-- (858) 538-5051 FAX: (858) 538-5051 San Diego, California-- Public Internet access / Mailing Lists To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).
RE: friday fun early, vote for favorites / Re: COLLECTION ITERATO
Title: RE: friday fun early, vote for favorites / Re: COLLECTION ITERATO Along with those, came these: SCRAP PURE TOOL SCRAP PURE LOOT CRAP, PURE STOOL (i added the comma) RAPE SCOUR PLOT PAL PROSECUTOR STAR ULCER POOP STAR CRUEL POOP A SUPER PROCTOL --Chris [EMAIL PROTECTED] -Original Message- From: Mohan, Ross [mailto:[EMAIL PROTECTED]] Sent: Thursday, May 10, 2001 5:17 PM To: Multiple recipients of list ORACLE-L Subject: RE: friday fun early, vote for favorites / Re: COLLECTION ITERATO Yea, and oracle support yields A corrupt slope A cursor topple A cost/rule prop Oracle port pus CPA result: poor Prolate Corpus Poor trace plus Real poop crust Cool purse trap