Re: Expense of 'over ... partition by'

Jonathan Lewis Fri, 05 Dec 2003 02:21:18 -0800

Jared,

I think what you've discovered is just a repeat
of the fact that different functionality is appropriate
in different circumstances.


Imagine replacing your v$sql_workarea_histogram
with a chunky SQL statement that crunched through
a massive table producing a small result set.

In those circumstances, your analytic approach would
sort a small set twice having done one big crunch. With
the group by approach, you would have to crunch the big
data set twice.  I know which option would be cheaper.
(You then have to wonder whether you could produce
the small result set using subquery factoring 'with subquery'
as another possible optimisation strategy).

BTW - did you notice how Oracle didn't do a sort for
the order by in the GROUP BY example, because
the optimizer could infer that the data had already been
ordered by the GROUP BY ?  That's the reason why
your GROUP BY example did less sorting.  (I'm not
sure you need the GROUP BY, though I may be missing
something).

BTW-2:  in the analytic clause, the (partition by 1) is not
necessary, you can write:
>                 , sum(optimal_executions) over ( )


Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html


Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____UK___November


The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


----- Original Message ----- 
To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]>
Sent: Friday, December 05, 2003 12:24 AM


> While working on some scripts to monitor PGA usage on 9i, I came across
> something interesting while experimenting with different forms of SQL.
>
> I have recently been forcing myself to make use of 'OVER..PARTITION BY' in
> SQL so as to be more comfortable in using it.  Can't add new tools to the
> box until I
> know how to use them.  :)  Yes, I know I should have been using them long
> ago.
>
> Anyway, I thought it might be interesting to compare the forms of SQL with
> and
> without the use of OVER...PARTITION BY.
>
> This SQL can be run on any instance that has PGA_AGGREGATE_TARGET set.
>
> Here is the SQL using OVER:
>
> select
>         low_optimal_size_kb
>         , high_optimal_size_kb
>         , optimal_executions
>         , onepass_executions
>         , multipasses_executions
>         , total_executions
>         , optimal_executions / sum_optimal_executions * 100
> pct_optimal_executions
> from (
>         select
>                 low_optimal_size/1024 low_optimal_size_kb
>                 , (high_optimal_size+1)/1024 high_optimal_size_kb
>                 , optimal_executions
>                 , onepass_executions
>                 , multipasses_executions
>                 , total_executions
>                 , sum(optimal_executions) over ( partition by 1 )
> sum_optimal_executions
>         from v$sql_workarea_histogram
>         where total_executions != 0
> ) a
> order by low_optimal_size_kb
> /
>
> and here is the SQL using good old GROUP BY
>
> select
>         low_optimal_size_kb
>         , high_optimal_size_kb
>         , optimal_executions
>         , onepass_executions
>         , multipasses_executions
>         , total_executions
>         , optimal_executions / sum_optimal_executions * 100
> pct_optimal_executions
> from (
>         select
>                 h.low_optimal_size/1024 low_optimal_size_kb
>                 , (h.high_optimal_size+1)/1024 high_optimal_size_kb
>                 , h.optimal_executions
>                 , h.onepass_executions
>                 , h.multipasses_executions
>                 , h.total_executions
>                 , hs.sum_optimal_executions
>         from v$sql_workarea_histogram h,
>                 (
>                         select sum(optimal_executions)
> sum_optimal_executions
>                         from v$sql_workarea_histogram
>                 ) hs
>         where h.total_executions != 0
>         group by h.low_optimal_size/1024
>                 ,(h.high_optimal_size+1)/1024
>                 , h.optimal_executions
>                 , h.onepass_executions
>                 , h.multipasses_executions
>                 , h.total_executions
>                 , hs.sum_optimal_executions
> ) a
> order by low_optimal_size_kb
> /
>
>
> The new version is significantly simpler.
>
> It then seemed that it might be interesting to compare the performance and
> scalability of the two methods.
>
> This is where it gets interesting.
>
>
> 16:10:47 rsysdevdb.radisys.com - [EMAIL PROTECTED] SQL> @run_stats
>
> NAME                                           RUN1       RUN2       DIFF
> ---------------------------------------- ---------- ---------- ----------
> LATCH.lgwr LWN SCN                                1          0         -1
> LATCH.mostly latch-free SCN                       1          0         -1
> LATCH.undo global data                            1          0         -1
> STAT...active txn count during cleanout           1          0         -1
> STAT...consistent gets                            5          4         -1
> STAT...db block gets                             28         29          1
> STAT...enqueue requests                           1          0         -1
> STAT...redo entries                              17         18          1
> STAT...deferred (CURRENT) block cleanout          4          3         -1
>  applications
>
> STAT...consistent gets - examination              1          0         -1
> STAT...cleanout - number of ktugct calls          1          0         -1
> STAT...calls to kcmgcs                            7          6         -1
> STAT...calls to get snapshot scn: kcmgss       1006       1005         -1
> LATCH.Consistent RBA                              2          0         -2
> STAT...recursive cpu usage                       29         31          2
> LATCH.redo allocation                            20         18         -2
> LATCH.cache buffers chains                      102        105          3
> LATCH.redo writing                                4          0         -4
> LATCH.library cache                            2014       2008         -6
> LATCH.library cache pin                        2012       2006         -6
> LATCH.messages                                    8          0         -8
> STAT...redo size                              27096      27508        412
> STAT...sorts (memory)                          1004       2004       1000
> LATCH.SQL memory manager workarea list l          0       2000       2000
> atch
>
> STAT...workarea executions - optimal           2008       4008       2000
> STAT...sorts (rows)                            6112      10112       4000
>
> 26 rows selected.
>
> RUN1 is the the GROUP BY SQL
> RUN2 is the OVER...PARTITION BY SQL
>
> The OVER version of the SQL is significantly more expensive in terms of
> sorting and latching.
>
> Has anyone else noticed this?
>
> Or perhaps my use of OVER..PARTITION BY needs some optimization, which is
> clearly
> in the realm of possibility.  :)
>
> Jared
>
>
>

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Jonathan Lewis
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Re: Expense of 'over ... partition by'

Reply via email to