RE: CBO and cartesian product

2003-10-13 Thread John Kanagaraj
Tim/Dilip,

Unfortunately, as this is an 'Apps' instance, the parameters
DB_FILE_MULTIBLOCK_READ_COUNT should be set to 8 and the
OPTIMIZER_INDEX_CACHING parameter should *not* be set (letting it
default)... This is as per ML Note 216205.1 - non compliance = "not
supported".

Since this is a customized report though, you *may* be able to get away with
setting them within the program (or reverting to RULE as a quick fix)

John Kanagaraj
DB Soft Inc
Phone: 408-970-7002 (W)

Listen to great, commercial-free christian music 24x7x365 at
http://www.klove.com

** The opinions and facts contained in this message are entirely mine and do
not reflect those of my employer or customers **

>-Original Message-
>From: Tim Gorman [mailto:[EMAIL PROTECTED] 
>Sent: Saturday, October 11, 2003 11:14 AM
>To: Multiple recipients of list ORACLE-L
>Subject: Re: CBO and cartesian product
>
>
>Here is the short answer:
>=
>
>   * Set OPTIMIZER_INDEX_CACHING to 90
>   * Make sure that DB_FILE_MULTIBLOCK_READ_COUNT is not overly high
>   * Also, consider gathering column-level statistics on some of the
> indexed columns involved, especially if the query in question
> uses literal data values on them
>
>Here is the long answer:
>
>
>Starting in the 8i timeframe, the CBO started borrowing some 
>techniques from
>data warehouse STAR joins when confronted with any type of query that
>traversed two different entity-relationship heirarchies 
>starting from the
>same table.
>
>Say you have three tables (to keep it simple).  One table is a "child"
>entity to the other two tables, which are both "parent" entities in ERD
>terms.  The CBO detects that both "parent" tables are much 
>smaller than the
>"child" table.
>
>OK, so there is no relationship between the two parent tables 
>-- they are
>both "related" only through the large child table.
>
>Now, think about what traditional join methods are possible:
>
>1) start with one of the parent tables as the "driving table", do a
>   indexed nested-loop range-scan during the join to the 
>child table,
>   and then perform indexed nested-loop unique-scan during 
>the final
>   join to the other parent table
>2) reverse the order of option #1.  Start with the other parent
>   table, join to the child, and then join up to the 
>remaining parent
>3) start with the child table and join up (via indexed 
>unique-scans)
>   to the two parent tables
>
>The weak point of both of these options is probably the access 
>of the child
>table.  Plain and simple, it is difficult to efficiently get 
>rows from it.
>It is likely that the index supporting the foreign-key 
>relationship from
>either parent table is not very efficient by itself, resulting 
>in a very
>expensive range-scan, requiring a massive number of logical 
>I/Os and "cost"
>calculated by the CBO.
>
>So, the CBO in 8i started utilizing another option, which 
>initially blew my
>mind first time I saw it happen.  It was the point which I 
>realized that the
>CBO was _way_ smarter than humans...
>
>This additional option is to perform a "cartesian join" between the two
>parent tables, to come up with one result set.  Then, using 
>the filtered
>cartesian result set from that join, the CBO probes into the 
>large child
>table using the _combined_ keys from both parent tables!
>
>Rather brilliant choice, in most cases.  The cartesian join, despite
>everybody's visceral fear of it, is actually rather 
>insignificant if the two
>parent tables are small.  And it is even smaller if there are good
>"filtering" predicates on those tables in the WHERE clause.
>
>So, instead of having to retrieve rows from the large child 
>table using one
>or the other of the relatively ineffective indexes supporting 
>each foreign
>key, the CBO merges and uses both keys, resulting in a far 
>more effective
>access method into the child table.
>
>So, chances are good that this is the situation you are 
>facing.  Is this
>correct?  Can you verify the basic relationships between the tables
>involved?
>
>So, now the question is:  why did the CBO make the wrong choice?
>
>First, the default setting of the OPTIMIZER_INDEX_CACHING 
>parameter (i.e.
>"0") represents a flaw in the basic costing algorithm used by the CBO.
>Setting the parameter to 90 or so fixes this flaw.  For a more detailed
>explanation, please feel free to view my paper "Search for 
>Intelligent Life
>in the CBO", available online at "http://www.EvDBT.com/papers.htm";

Re: CBO and cartesian product

2003-10-11 Thread Tim Gorman
Here is the short answer:
=

   * Set OPTIMIZER_INDEX_CACHING to 90
   * Make sure that DB_FILE_MULTIBLOCK_READ_COUNT is not overly high
   * Also, consider gathering column-level statistics on some of the
 indexed columns involved, especially if the query in question
 uses literal data values on them

Here is the long answer:


Starting in the 8i timeframe, the CBO started borrowing some techniques from
data warehouse STAR joins when confronted with any type of query that
traversed two different entity-relationship heirarchies starting from the
same table.

Say you have three tables (to keep it simple).  One table is a "child"
entity to the other two tables, which are both "parent" entities in ERD
terms.  The CBO detects that both "parent" tables are much smaller than the
"child" table.

OK, so there is no relationship between the two parent tables -- they are
both "related" only through the large child table.

Now, think about what traditional join methods are possible:

1) start with one of the parent tables as the "driving table", do a
   indexed nested-loop range-scan during the join to the child table,
   and then perform indexed nested-loop unique-scan during the final
   join to the other parent table
2) reverse the order of option #1.  Start with the other parent
   table, join to the child, and then join up to the remaining parent
3) start with the child table and join up (via indexed unique-scans)
   to the two parent tables

The weak point of both of these options is probably the access of the child
table.  Plain and simple, it is difficult to efficiently get rows from it.
It is likely that the index supporting the foreign-key relationship from
either parent table is not very efficient by itself, resulting in a very
expensive range-scan, requiring a massive number of logical I/Os and "cost"
calculated by the CBO.

So, the CBO in 8i started utilizing another option, which initially blew my
mind first time I saw it happen.  It was the point which I realized that the
CBO was _way_ smarter than humans...

This additional option is to perform a "cartesian join" between the two
parent tables, to come up with one result set.  Then, using the filtered
cartesian result set from that join, the CBO probes into the large child
table using the _combined_ keys from both parent tables!

Rather brilliant choice, in most cases.  The cartesian join, despite
everybody's visceral fear of it, is actually rather insignificant if the two
parent tables are small.  And it is even smaller if there are good
"filtering" predicates on those tables in the WHERE clause.

So, instead of having to retrieve rows from the large child table using one
or the other of the relatively ineffective indexes supporting each foreign
key, the CBO merges and uses both keys, resulting in a far more effective
access method into the child table.

So, chances are good that this is the situation you are facing.  Is this
correct?  Can you verify the basic relationships between the tables
involved?

So, now the question is:  why did the CBO make the wrong choice?

First, the default setting of the OPTIMIZER_INDEX_CACHING parameter (i.e.
"0") represents a flaw in the basic costing algorithm used by the CBO.
Setting the parameter to 90 or so fixes this flaw.  For a more detailed
explanation, please feel free to view my paper "Search for Intelligent Life
in the CBO", available online at "http://www.EvDBT.com/papers.htm";.

Changing that alone may cause the CBO to rethink its decision to go with the
derived STAR-join scheme involving a cartesian join, and instead choose the
indexed nested-loops scheme which is the __only__ possible choice by the
RBO.  By discounting the cost of index-based access methods, the CBO (which
considers _all_ possible access methods and chooses the one with the lowest
cost) may now choose the index-based plan.  Once again, the RBO only
considered the one plan, which in this case turned into a bit of luck for
the RBO, making it look good.

You can experiment with this parameter change using ALTER SESSION, if you
like.  This is one of the _few_ occasions on which changing a parameter
actually has an impact on performance.

There are some other parameter settings which may impact how the CBO costs
this query.  For example, if DB_FILE_MULTIBLOCK_READ_COUNT is set higher
than its default value of 8 or 16, then the CBO will think that access plans
involving FULL table scans are cheaper than they are.

Another possible cause for the CBO's bad decision is it's default assumption
that data values in a column are evenly distributed.  Gathering stats for
indexed columns only gathers the number of distinct keys and the low/high
values, by default.  Therefore, the CBO has no choice except to assume an
even distribution of data, that each distinct data value is used by an equal
number of rows.  Gathering column-level statistics creates "histograms" in

CBO and cartesian product

2003-10-11 Thread Dilip
Hi List,

DB version - 8174. (Oracle Apps 11.5.4).

One of the customized report started running very slowly. One query was taking more 
than 3 GB of TEMP tablespace for 'HASH' type segment and erroring out after 2 hours 
for a lack of space in TEMP. 

Tkprof showed that it is doing Cartesian Sort Merge Join. 
After running the same report under RBO, it is using Nested loops and report is 
completing in just 5 minutes.
The stats were collected last week-end. I couldn't find any reason for this CBO's 
behavior ? Has anybody experienced this before ?

Thanks,
~Dilip


 



Get Your Private, Free E-mail from Indiatimes at http://email.indiatimes.com

 Buy The Best In BOOKS at http://www.bestsellers.indiatimes.com

Bid for for Air Tickets @ Re.1 on Air Sahara Flights. Just log on to 
http://airsahara.indiatimes.com and Bid Now !

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Dilip
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).