Re: middle line (was Re: Daily Perl FAQ...)

Michael G Schwern Thu, 29 Nov 2001 12:05:49 -0800

On Thu, Nov 29, 2001 at 11:34:24AM -0800, Chris Thorpe wrote:
> 
> On Thu, 29 Nov 2001, Yanick wrote:
> 
> > On Thu, Nov 29, 2001 at 01:34:02PM -0500, Michael G Schwern wrote:
> >
> >>> Yesterday, I saw an interesting related exercise.  Write a program that
> >>> reads the lines from a file and outputs the middle line.  The kicker is
> >>> that you can't use arrays.
> > >
> > > I'll interpret that as O(1) memory, O(n) time.
> 
>   You can't do it in O(1) memory and O(n) time.
<snip>
>   If you elect to use O(1) memory, then you have to use O(1.5*n) time,
> as the already submitted examples do.
> 
>   It seems silly to try to use O(n/2) memory if you aren't allowed to use
> an array.


O(n) == O(1.5*n) == O(n/2).


A Crash Course On BigO Notation As Presented By A Guy Who Failed
Fundamental Data Structures And Algorithms 1 *

    O(1) means it runs in a fixed amount of time regardless of how much
         data is to be processed.  A hash is an O(1) algorithm.

    O(logn) means as you increase the amount of data, the time required
            will increase logarithmicly.  In order to double the run time
            you have to exponentially increase the data.

    O(n) means that as you increase the amount of data, the amount
         of time required will increase as well.  Double the data,
         double the run time.  Iterating through the lines of a file
         or elements of an array is O(n).

    O(n**2) [order n squared] means that as you increase the amount of
            data, the amount of time required will increase exponentially.
            Double the data, quadruple the run time.

    etc...

Here's the tricky part... constants are ignored.  An algorithm that
reads each line in a file once is O(n).  An algorithm that reads each
line in a file twice is O(n).  Even if it reads each line 10,000 times
it's still O(n).  This is because if you double the size of the file,
you only double the run-time.

So O(1.5*n) == O(n).

BigO is an expression of how the algorithm will perform as the data
size approaches infinity.  It is also refered to as worst-case
analysis.  As such it's a tad conservative.  There's also best-case
analysis and average-case analysis, I forget the symbols... theta or
something.  A basic quicksort is O(n**2), but this only happens if you
try to quicksort an already sorted array.  It's average case is nlogn.
Most practical quicksort implementations have safeguards to prevent
their worst case.

So when someone says "You're allowed O(1) memory and O(n) time to read
this file" it means that no matter how big the file, your algorithm
can only use a fixed amount of memory.  And if the size of the file
doubles, your program will run in twice the time.


Constants become useful again when comparing the run-time of
algorithms with the same BigO notations.  They also become useful when
the amount of data involved is relatively small or the constant involved
in "better" algorithm is prohibatively high.  This is why this:

    my @sorted_keys = sort keys %hash;

is usually more efficient than trying to replace the %hash with some
structure that keeps the keys always sorted, like a tree.


* If you check under your seats you'll find a packet of salt tablets.
Try not to mix them up with the suicide pill.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>    http://www.pobox.com/~schwern/
Perl Quality Assurance      <[EMAIL PROTECTED]>         Kwalitee Is Job One
IMHO bugs in Perl 5 shouldn't carry over to Perl 6. (Unless, of course,
we *like* the bugs... ;)
        -- Ken Fox in <[EMAIL PROTECTED]>

Re: middle line (was Re: Daily Perl FAQ...)

Reply via email to