Ouch, I gotta take a day off to read this one ;) Tanel.
----- Original Message ----- To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]> Sent: Monday, December 08, 2003 11:29 PM > Hi Yong, > > Saying there are a "few" errors is being a little kind to Don's "Inside > Oracle Indexing" article. > > In part, these are some of the issues I raised directly with Don in a number > of emails (warning somewhat on the longish side ;): > a.. There are no such things as star indexes. Star joins, yes, star > transformations yes, but not star indexes ? > b.. I still disagree with your description of b-tree indexes being complex > and difficult to understand, but then again this could just be my personal > perception (check out > http://groups.google.com/groups?q=g:thl3498916429d&dq=&hl=en&lr=&ie=UTF-8&se > lm=ant%259.39604%24jM5.100537%40newsfeeds.bigpond.com&rnum=47 where I have a > sample demo on how to investigate the workings of b-tree indexes.) However, > by understanding them and a how they function, the question of whether or > not they need rebuilding no longer needs to be debated. It becomes easily > apparent under what conditions indexes could benefit from a rebuild. I'll > expand on this later but I would suggest those that "debate", those that > really don't know when a rebuild is justified and just rebuild in the hope > it might do some good are those that really don't understand "how" indexes > function. Knowledge is the key that unlocks the door of doubt and those > without the key fumble aimlessly and prod around in hope... > c.. Your subsequent quote "There is enough anecdotal evidence that index > rebuilding has helped some systems perform better, and I also have no doubt > that there is no scientific basis for the claim" is a nonsense. Of course > one explain in scientific terms such performance improvements, I can only > suggest that you unfortunately can't. Oracle is not some magic piece of > software and it doesn't run on some magical pieces of hardware. Any > suggestions to the contrary are not helpful to anyone. > d.. I still disagree with the double the block size, halving the logical > reads must be a good thing argument. It's a path that could lead to a very > disappointing conclusion (read cliff edge). Indexes prefer large block sizes > true but if the underlining storage file-system is not tuned to read (or > write) these larger block sizes efficiently, then the whole thing is counter > productive. You've been warned ... > e.. Your description of PCTUSED is still wrong. There is no PCTUSED for > indexes so it really shouldn't be misleading to confuse a non-existing index > attribute with the amount of used space as documented in INDEX_STATS... > f.. Including in your criteria for rebuilding an index "btree_space being > greater than a block" is redundant when listed with the other criteria. It > is fundamentally impossible for an index with 4 levels or more to consist of > a single block, so why mention it. It just adds confusion and is silly. The > DBA who swears by this criteria (which I noticed has changed in this draft > ;), how do they make such a claim? It's one thing to swear, it's quite > another to prove. Your table that lists average rows and blocks per > different index levels shows that those indexes with a leaf row length of > 500,000,000 and with 100,000 blocks require 4 levels. How does rebuilding > such indexes with no subsequent change in index level improve performance ? > I mean, large indexes need more levels right, so rebuilding them all the > time and keeping the levels unchanged only to rebuild them again because > they're still 4 or more levels seems like a pointless, never-ending exercise > in futility. To rebuild an index that "actually" results in a reduction in > it's level generally requires a "drastic" reduction in it's data volume due > to the orders of sizing magnitude that a new level represents. More on this > and the other so-called rebuild criteria later but the current level of an > index is not a criteria for a rebuild. A level 3 index could conceivably be > rebuilt to just a level 1 (if there were heaps and heaps of deletions) and a > level 5 index could be rebuilt to stay at level 5. Which index has > benefited . > g.. Criteria for a rebuild: or the total length of deleted is > 1 block > makes no sense whatsoever. Nearly all indexes would have a total length of > deleted > than 1 block meaning nearly all indexes need rebuilding. I don't > think so ... > h.. Your discussion on the clustering factor affecting the likelihood of > requiring an index rebuild is still flawed, however interestingly, you've > now given an example on why this is the case. However, you've still come to > the wrong conclusion !! Firstly, you're incorrect in your example to say > that a 1,000,000 row table with a clustering factor of 1,000,000 has it's > rows in the same order as it's index although I guess this could be a typo. > Regardless, if you delete all last_name beginning with a K, you are going to > delete consecutive leaf nodes regardless of the clustering factor. So what > difference does it make to the "index". None. To the table, yes, you either > delete rows from all differing blocks or rows from a small number of blocks > but to the index, it makes no difference, hence your claim makes no sense. I > think you've confused what the clustering factor of an index represents ... > OK, your whole discussion of these two "camps", this whole concept of both > being right, or wrong, or whatever, is pointless as it doesn't resolve > anything. You mention that the "Academics" (a term I dislike) claim that > "indexes rarely benefit from a rebuild" without discussing what academics > mean by "rarely". Obviously they accept that index rebuilds are sometimes > (rarely) beneficial, so what are these cases ? You mention that the > pragmatic approach sometimes results in better performance and that index > rebuilds are sometimes clearly beneficial. So obviously, they have a case. > It seems obvious (to me anyway) that perhaps there's an overlap here, that > perhaps "everyone" agrees that index rebuilds are beneficial. Maybe some > have the "key" and know how to unlock the doors directly whiles others do a > bit more pocking around in the dark ? > > > > I think there are two fundamental questions/issues you've failed to address: > > > > - Why/when would an index require and benefit from a rebuild ? > > - How would one monitor that such a rebuild indeed has been beneficial ? > > > > Let me attempt to address these questions. > > > > Firstly, why would an index require a rebuild ? Answer, because the index is > currently inefficient and by rebuilding it, Oracle will "noticeably" improve > it's performance to the point that the cost of rebuilding the thing is > justified. It's all kinda simple really. So what is an "inefficient" index ? > One that has so much "wasted" space, that by rebuilding and reclaiming this > space, would reduce the "cost" of accessing this index (or indeed Oracle > could now choose to use the index in the first place) such that performance > now "noticeably" improves. The key words here are "wasted", "cost" and > "noticeably". > > > > So what is wasted space ? Well any space that is not currently used within > the index structure is potentially wasted. However, a key point is that if > this space is either: > > > > - going to be subsequently used within an appropriate timeframe, or > > - going to reoccur within an appropriate timeframe > > > > then it's not really wasted is it ? I mean, if we're going to subsequently > use this space, then this "unused" space is not really an issue. If after > the rebuild, this unused space subsequently returns, then the rebuild is > kinda pointless isn't it ? So space is only really wasted if we don't intend > to use it or if by getting rid of the wasted space, we keep it away. > > > > Note that "some" unused space is a good thing. Why, because it gives index > blocks spare capacity to avoid block splits. Block splits occur when a block > has insufficient free space in which to store new index entries and a block > split is not particularly nice. It involves extra I/O to get a new leaf > node, it involves extra CPU to redistribute the index data, it requires > extra redo, etc. etc. It also results in now two leaf nodes having 50% > unused space. Net effect, reduced performance and the generation of unused > space, exactly what the rebuild was trying to prevent . So avoiding block > splits is a good thing that unused space provides. > > > > How does an index get wasted space. Well if we keep our above criteria in > mind, not that easily. Note that current free space within an index can > generally be consumed by subsequent inserts, note that deleted index space > can be subsequently reused, note that totally emptied blocks can be reused > by subsequent index splits. So the chance of any free space being eventually > used is high (please see Metalink Note 182699.1 where Oracle have published > my warnings regarding unnecessarily rebuilding indexes due to these > factors). However, there are situations when this free space may never get > reused and so is potentially "wasted" which include: > > a.. An index is created with an excessive PCTFREE value which subsequent > index growth will never use (somewhat rare and a stupid thing to do in the > first place) > b.. When we have deletes with monotonically increasing index entries. The > deleted space can not be reused as all new entries live in the last index > node unless all entries are deleted from the node. So it's sparse deletes on > incrementally increasing index values. Note this requires knowledge of the > characteristics of the index to identify. > c.. Similar scenario to above, but sparse deletions of ranges of values > that are no longer valid insertable values > d.. When we perform a large/bulk delete with no prospect of re-entering > the same volume of data. However note in this case the table itself would > likely have an inflated HWM and so it's the table (and hence implicitly the > indexes) that would potentially benefit from a rebuild. > e.. When we have enough occurrences of particular index values that they > span over multiple index nodes. As Oracle: > a.. performs 50/50 block splits (unless it's the highest value in the > leaf node where a 90-10 split is generated), and > b.. inserts only into the last referenced leaf node of the value > Oracle will leave behind a trail of ½ emptied blocks that can not be > filled as they only contain references to the single index value which can > only be inserted into the last leaf node containing this referenced value > (again, unless all the corresponding index entries are subsequently > deleted). These indexes are identified as those with a low ratio of distinct > values to leaf blocks (except in rare cases with wildly non uniform > distribution of data) > > In most other situations, current unused space is "useable". Therefore > indexes that "potentially" require rebuilding are those that have > "sufficient" unused space AND meet the above criteria. Note this is the > *only* metric worth considering when determining to rebuild an index. What > is the current used/unused space in the index (pct_used) AND what are the > characteristics of the index that would prevent this space from being > subsequently used within an appropriate period of time. Note that the > criteria listed above rules out the vast majority of indexes from being > rebuild candidates. > > > > So what is "sufficient" unused space that would warrant a rebuild ? Again, > it goes back to my early point. Those indexes by which removing this > "wasted" space would result a noticeable improvement in performance. > Surprisingly, this is rarer than many imagine. > > > > Let me give you a typical example (one similar to Jonathan's in his DBAzine > article). > > > > I have a "very inefficient" 4 level b*tree index, one in which my leaf nodes > are 50% empty. It currently only houses 100 index entries when it could > potentially store 200. I have a query that uses this index via a range scan > which results in 1000 rows returned. Before the rebuild, we require: > > > > 3 LIOs to navigate the index branches > > 10 LIOs to read all the necessary index entries from the index > leaf nodes > > 1000 LIOs to access the row data stored in the table > > > > Total 1013 LIOs. > > > > After the rebuild, we still have a 4 level index (didn't eliminate > sufficient entries to reduce the level) but now have ½ the previous leaf > nodes. Now we require: > > > > 3 LIOs to navigate the index branches > > 5 LIOs to read all the necessary index entries from the index > leaf nodes > > 1000 LIOs to access all the row data stored in the table > > > > Total 1008 LIOs (or an improvement of 0.49%) > > > > This improvement is only within the SQL. We still have the same parsing > overheads, network overheads, processing within the application, etc. etc. > so the total net effect of response time would be substantially less. > However even assuming this improvement across the board, a (say) 10 sec > application response time has been improved by this index rebuild by 0.049 > of a sec. > > > > Hardly an improvement worth writing to mum about and this with an index that > had a pct_used of only 50% and a range scan that returns a (relatively > large) 1000 rows. Now if only we could spend the effort to reduce the row > accesses down to 10 rows, then dear mum might be more excited ... > > > > If this were a unique scan there would be NO difference in LIOs. None. > However by having double the necessary leaf nodes, we might decrease the > likelihood of finding the index blocks in cache and increase the likelihood > of pushing out other favourable objects from cache, which could result in > additional physical I/O. That said, if this were a popular index, the odds > of the required blocks being cached is high and considering you actively > promote caching of entire databases, it's an issue I won't dwell on ;) > > > > So for an index rebuild to be justified and for it to have a noticeable > effect on performance, it requires a massive proportion of unused space to > be reclaimed (rare considering the workings of b*trees as discussed) AND it > requires very large numbers of index blocks to be accessed by the > applications. > > > > So if the above index were used by an important batch program and accessed > via a fast full index scan, then our story could be different. Lets say the > entire index has been reduced from 100,000 index blocks down to 50,000 index > blocks after the rebuild. That's a reduction of 50000 blocks to be read or > 50% which might be a noticeable result (of course the multiblock read stuffs > up my nice LIO count somewhat ;) > > > > However, you get my point. Now we have a scenario where we have a > significant amount of unused space (50%) AND a significant number of index > blocks (100%) that we wish to access. > > > > To determine whether an index rebuild has been justified is relatively > straight forward. Has performance improved on the key applications that > depend on the rebuilt index(es). This can be monitored in a number of ways. > > > > I know of one previous manager who, with a stop watch in hand, would > periodically time end user operations. If they took longer than expected, > watch out. Although crude, it does kinda make a point in that overall > response time is the issue. If by rebuilding an index, various statistics > and space utilization ratios look better, it means zip if nothing actually > appears to run faster. > > > > Therefore you need to store metrics beforehand, when things were running > slower and then make comparisons after the index rebuild. Has it actually > helped ? These metrics could be in the form of: > > > > Managers with stop watches > > Timings of corresponding code through SQL*PLUS > > Timings as generated directly by applications/batch jobs > > Trace Files that document execution statistics, execution > timings and wait timings (preferred) > > etc .. > > > > The usual care needs to be taken ensure that any changes in timings can be > attributed to the index rebuild and not other changed variables such as > different database load, other structural changes, etc. That's why I like > the trace file method where you can see what is causing what to wait and for > how long, etc. Also such timings need to continue periodically to see how > long any possible performance benefits continue. However, the point is such > improvements need to be measurable, else what's the point. > > > > Finally, I just want to make the point that rebuilding indexes (and perhaps > just as importantly generating statistics such as you suggest with validate > structure commands) is not cheap. It chews up heaps of resources and > generates various locking issues, particularly validate structure which > locks the entire table during it's duration (the online option ain't much > use from a generating stats point of view) but even index rebuilds can be > troublesome. If you have the spare resources and/or you have the > availability, great go for it, but if you don't then pointless index > rebuilds need to be avoided. > > > > It all comes back to the question of do the pros and the benefits of index > rebuilds justify the cons and the costs of rebuilding the buggers. > > > > Don, this is not rocket science, it's all just common sense really. Your > article suggests that this is all somehow mysterious, ambiguous, that > rebuilds sometimes seem to help but for some spooky reason nobody knows why. > This is not the case at all. Index rebuilds are beneficial sometimes because > the resultant reduction in LIOs results in either less overheads when using > the index or in some cases in the index being used in the first place. Index > rebuilds generally are not beneficial because there is generally not enough > reduction in LIOs for it to be noticeable to you, or I or to mum or to the > end users, etc." > > > > and after a different version of the article appeared I made the following > points: > > > > "I notice that your Index article has changed yet again (up to version 3 now > ?), unfortunately re-introducing many of the inaccuracies I previously > highlighted. > > > However, this time, you've used the index metrics to create what you > describe as "very interesting reports". Interesting indeed !! In my mission > to get this article of yours to a professional standard, let me add these > points to my ever increasing list of issues with your article: > a.. There is no such table as idx_stats. Do you means index_stats or do > you mean your index_details table ? > b.. You reference a column called sum_key_len which isn't defined anywhere > probably because there's no such column and that's probably because if it's > meant to represent the length of an index entry, it's a variable value > dependent on each individual index entry. Therefore the manner in which it's > used throughout this report is incorrect and will produce inaccurate > results. > c.. The "Blocks" column C2 specifies all blocks allocated to the index > segment including those blocks above the HWM. You do realize that other than > perhaps wasting space, blocks above the HWM do not impact index performance > at all ... > d.. The "Dense Full Block Space" column C7 is defined incorrect and is > totally meaningless as it: > a.. doesn't consider the "unusable" portion of leaf blocks (block header > and the such) > b.. doesn't consider the full space required for an index entry (rowid, > lock bytes, length bytes, etc) > c.. doesn't consider the space required for branch blocks > d.. incorrectly computes the space used as the "number of rows" * "sum > of the key lengths" (which as mentioned is both undefined and variable so is > an inaccurate way of determining the space required by the index) > e.. incorrectly multiples (rather than divides) this meaningless figure > by the pct_free less space > What you have here is a number that's equivalent to a random number > multiplied by your birthdate, of some mild interest but of no relevance when > discussing index characteristics !! > A more accurate formula would be: > > ceil((lf_rows_len - del_lf_rows_len) / lf_blk_len) + ceil((br_rows_len > / br_blk_len)) / ((100 - pct_free)/100) > > if what you're trying to do is approximate how many blocks this index > would use if rebuilt with its current pct_free value (I'm assuming at least > a level 2 index). > > a.. The next column "Percent Free Blocks" C11 is also totally meaningless > for all the above reasons *and* because you're calculating the approximate > "wasted" blocks within the index structure by using the "blocks" statistic > which as mentioned earlier includes all blocks above the HWM. An index that > consists of just one block but has an initial extent of 1M would appear a > possible candidate for a rebuild but it would be a bit of a pointless > exercise. Blocks above the HWM do not effect the efficiency of the index, > invalidating the purpose of what you're trying to represent here. Rather > than blocks, I would suggest lf_blks + br_blks would be more appropriate and > meaningful value that determines the number of blocks actually in the > current index structure. > b.. The column "Computed Empty Block" C10 is (you guess it) inaccurate and > totally meaningless. You again insist on incorrectly multiplying del_lf_rows > by the non-existent/non meaningful sum_key_len rather than just using > del_lf_rows_len (which you're trying to compute anyway) and you're still > dividing by the full blocksize rather than the more meaningful lf_blk_len > (the usable block size). Your C10 therefore should look like: > (del_lf_rows_len / lf_blk_len)" > > > Hopefully these comments will do some good not only to Don but to anyone > trying to understand this whole issue. > > Regards > > Richard Foote > ----- Original Message ----- > To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]> > Sent: Saturday, December 06, 2003 6:29 AM > > > > Tanel, > > > > I think you're saying a query almost always runs faster right after the > index > > rebuild and there's no point in finding the criterion whether to rebuild > an > > index. (What is "42"?) > > > > Some time ago I posted a message somewhere else showing a case where > rebuilding > > or coalescing an index may be benefitial. A data warehouse is found to > have > > some data errors. Deletes and updates are done. Then the database goes to > > mostly read-only again, and will last for a month or quarter. Then > shrinking > > frequently used B*Tree indexes is a good idea. Now I'd like to add one > more > > criterion as a result of reading Jonathan Lewis' dbazine article and email > with > > him (errors are mine): the index is full scanned, or if range scanned or > unique > > scanned, the index selectivity has to be fairly low (but not too low for > the > > index to be ignored by CBO). > > > > In a typical working environment, a data warehouse does have plenty of > > relatively quiet period. I worked on a monthly data load project at an > > insurance company. I remember we rebuilt a partitioned IOT (one partition > at a > > time) and fast full index scan (certain partitions) did run faster. > > > > There're some errors in Don Burleson's dbazine article (e.g. pct_used in > > dba_indexes) and Mike Hordila's Oramag article (structurally unbalanced > index). > > But one thing alluded to in there is important: study Oracle performance > > problems as scientific research. You said setting _wait_for_sync to false > > improves performance. That's a fact. We can only explain and analyze it > but not > > deny it. Similarly, when Mike says queries run 10 to 50% faster after > index > > rebuild, we can't deny unless we find his measurement is wrong. Wouldn't > it be > > nice if Oracle researchers write articles with sections like Abstract - > > Experimental - Results - Discussion in that order? > > > > Yong Huang > > > > Tanel Poder wrote: > > > > There's no point of arguing about whether a query ran faster right after > you > > rebuilt your index. Nor there is no point in finding some ultimate > algorithm > > for finding the point of index rebuilding, we all know the answer - it's > > "42". > > > > Instead, a long stress test has to be done, e.g. running 10 millions of > > continous transactions and queries (simulating real life). Do one 10M > > without rebuilding indexes in the meantime, measure total execution time, > IO > > amount, CPU usage, segment sizes etc. > > > > Then restore your database back to starting point and do the same test > again > > with regular index rebuilds during the operations (online or taking > "users" > > offline, depending on environment type). And then measure the same > > statistics, especially total execution time. Note, that statistics and > time > > also for rebuilding indexes should be accounted in totals, because in real > > life they don't just disappear somewhere as in some simple-minded tests. > > > > Tanel. > > > > __________________________________ > > Do you Yahoo!? > > Free Pop-Up Blocker - Get it now > > http://companion.yahoo.com/ > > -- > > Please see the official ORACLE-L FAQ: http://www.orafaq.net > > -- > > Author: Yong Huang > > INET: [EMAIL PROTECTED] > > > > Fat City Network Services -- 858-538-5051 http://www.fatcity.com > > San Diego, California -- Mailing list and web hosting services > > --------------------------------------------------------------------- > > To REMOVE yourself from this mailing list, send an E-Mail message > > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > > the message BODY, include a line containing: UNSUB ORACLE-L > > (or the name of mailing list you want to be removed from). You may > > also send the HELP command for other information (like subscribing). > > > > > -- > Please see the official ORACLE-L FAQ: http://www.orafaq.net > -- > Author: Richard Foote > INET: [EMAIL PROTECTED] > > Fat City Network Services -- 858-538-5051 http://www.fatcity.com > San Diego, California -- Mailing list and web hosting services > --------------------------------------------------------------------- > To REMOVE yourself from this mailing list, send an E-Mail message > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > the message BODY, include a line containing: UNSUB ORACLE-L > (or the name of mailing list you want to be removed from). You may > also send the HELP command for other information (like subscribing). > -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Tanel Poder INET: [EMAIL PROTECTED] Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).