Re: Naive question on lists of duplicates

Dylan Thurston Sat, 07 Jun 2003 11:39:37 -0700

On Thu, Jun 05, 2003 at 08:09:02AM -0500, Stecher, Jack wrote:
> I have an exceedingly simple problem to address, and am wondering if
> there are relatively straightforward ways to improve the efficiency
> of my solution.


Was there actually a problem with the efficiency of your first code?

> The task is simply to look at a lengthy list of stock keeping units
> (SKUs -- what retailers call individual items), stores, dates that a
> promotion started, dates the promotion ended, and something like
> sales amount; we want to pull out the records where promotions
> overlap.  I will have dates in yyyymmdd format, so there's probably
> no harm in treating them as Ints.

(Unless this is really a one-shot deal, I suspect using Ints for dates
is a bad decision...)

> My suggestion went something like this (I'm not at my desk so I
> don't have exactly what I typed):

I have a different algorithm, which should be nearly optimal, but I
find it harder to describe than to show the code (which is untested):

> import List(sortBy, insertBy)
>
> data PromotionRec  = PR {sku :: String, store :: String, startDate :: Int, endDate 
> :: Int, amount::Float}
>
> compareStart, compareEnd :: PromotionRec -> PromotionRec -> Ordering
> compareStart x y = compare (startDate x) (startDate y)
> compareEnd x y = compare (endDate x) (endDate y)

> overlap :: [PromoRec] -> [[PromoRec]]
> overlap l = filter (lambda l. length l > 1) 
>                (overlap' [] (sortBy compareStart l))
>
> overlap' _ [] = []
> overlap' active (x:xs) =
>   let {active' = dropWhile (lambda y. endDate y < startDate x) active} in
>   (x:active') : overlap' (insertBy compareEnd x active') xs

The key is that, by keeping a list of the currently active promotions
in order sorted by the ending date, we only need to discared an
initial portion of the list.

You could get a moderately more efficient implementation by keeping
the active list as a heap rather than a list.

Peace,
        Dylan

pgp00000.pgp
Description: PGP signature

Re: Naive question on lists of duplicates

Reply via email to