Re: middle line (was Re: Daily Perl FAQ...)

Uri Guttman Thu, 29 Nov 2001 12:22:48 -0800

>>>>> "MGS" == Michael G Schwern <[EMAIL PROTECTED]> writes:



  MGS> A Crash Course On BigO Notation As Presented By A Guy Who Failed
  MGS> Fundamental Data Structures And Algorithms 1 *

  MGS>     O(1) means it runs in a fixed amount of time regardless of how much
  MGS>          data is to be processed.  A hash is an O(1) algorithm.

actually given linear searches in buckets, most hashes are not truely
O(1) but close enough.

  MGS>     O(logn) means as you increase the amount of data, the time required
  MGS>             will increase logarithmicly.  In order to double the run time
  MGS>             you have to exponentially increase the data.

inverted definition. a simpler meaning is run time grows as the log of
the data size. but i see you did that as a counterpoint to this one:

  MGS>     O(n**2) [order n squared] means that as you increase the amount of
  MGS>             data, the amount of time required will increase exponentially.
  MGS>             Double the data, quadruple the run time.

  MGS> BigO is an expression of how the algorithm will perform as the data
  MGS> size approaches infinity.  It is also refered to as worst-case
  MGS> analysis.  As such it's a tad conservative.  There's also best-case
  MGS> analysis and average-case analysis, I forget the symbols... theta or
  MGS> something.  A basic quicksort is O(n**2), but this only happens if you
  MGS> try to quicksort an already sorted array.  It's average case is nlogn.
  MGS> Most practical quicksort implementations have safeguards to prevent
  MGS> their worst case.

another way to describe O() is how runtime changes as data size
changes. initial overhead and constant overhead per operation can be
neglected. so the O() function will reflect the shape of the growth
curve and not any absolute best way. a O(N**2) bubble sort will almost
always beat out any other sort if you have only a few elements to
sort. that is because of its simplicity and low overhead. but as the
data set grows its runtime will grow N**2 and that will eventually be
much slower than any typical O(Log N) sort.

note that the base of the log is never stated too. that is because you
can convert to any log base you want by multiplying with a constant and
constant factors are neglected. so an O(Log10 N) tree is the same as an
O(Log2 N) tree sort.

  MGS> Constants become useful again when comparing the run-time of
  MGS> algorithms with the same BigO notations.  They also become useful when
  MGS> the amount of data involved is relatively small or the constant involved
  MGS> in "better" algorithm is prohibatively high.  This is why this:

it is also why the GRT wins over the ST for larger data sets. the GRT
requires more overhead to set up and get out the data but it has a
faster compare operation. so when the data set gets large enough that
will win out. the overhead is linear and the sort is Log N. this also
means that O(Log N + N) is the same as O(Log N). that gets many
people. you only need to look at the highest growth function in O()
notation. it will eventually swamp out any lower growth function. both
the ST and GRT have a O(N) component but the O(Log N) of the actual sort
call is what counts.

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
-- Stem is an Open Source Network Development Toolkit and Application Suite -
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org

Re: middle line (was Re: Daily Perl FAQ...)

Reply via email to