Naive question on lists of duplicates

Stecher, Jack Thu, 05 Jun 2003 23:16:58 -0700
Hi, all.
 
I have an exceedingly simple problem to address, and am wondering if there are 
relatively straightforward ways to improve the efficiency of my solution.
 
The task is simply to look at a lengthy list of stock keeping units (SKUs -- what 
retailers call individual items), stores, dates that a promotion started, dates the 
promotion ended, and something like sales amount; we want to pull out the records 
where promotions overlap.  I will have dates in yyyymmdd format, so there's probably 
no harm in treating them as Ints.
 
My suggestion went something like this (I'm not at my desk so I don't have exactly 
what I typed):
 
> data PromotionRec  = PR {sku :: String, store :: String, startDate :: Int, endDate 
> :: Int, amount::Float}
>
> match                      ::  PromotionRec -> [PromotionRec] -> [PromotionRec]
> match _ []               = []
> match x (x:xs)         =  filter (meet x) xs
>
> nomatch                 ::  PromoRec -> [PromotionRec] -> [PromotionRec]
> nomatch _ []          = []
> nomatch x (x:xs)    =  filter (nomeet x) xs
>
> meet                      :: PromoRec -> PromoRec -> Bool
> meet x y                = and [startDate x <= endDate y, startDate y  <= startDate x]
>
> nomeet                  :: PromoRec -> PromoRec -> Bool
> nomeet x y            = not (meet x y)
>
> overlaps                :: [PromoRec] -> [[PromoRec]]
> overlaps []            = []
> overlaps (x:xs)      =  (x : match x xs) : (overlaps (nomatch x xs))
>
> overlap                 :: [PromoRec] -> [[PromoRec]]
> overlap []             = []
> overlap (x:xs)       = filter g1 (overlaps (x:xs)) where
>                                    g1 ys = (length ys) > 1
 
What I sent might have been slightly different, and I might have some typos in the 
above (as I'm composing from memory at the keyboard), but that's the rough idea:  
treat this like a quicksort, pulling out the records that overlap the first record, 
consing the first record onto the list, and then consing the resulting list onto the 
same function applied to the records that didn't overlap.  As a last step, delete the 
records that were not on promotion while anything else was on promotion.
 
I'm pretty confident that this will be more efficient than my colleague's SAS code, as 
he was comparing each record to every other record (giving n (n-1) comparisons).  It 
seems like this, in the worst case where everything is on promotion at distinct times, 
will compare the first record to (n-1) records, the second to (n-2) records, etc., 
giving n (n-1)/2 comparisons.  Thus, while this is worst-case O(n^2), it seems like it 
should have at most half as much work to do as the earlier approach in SAS.  On the 
other hand, in the best case, when everything is on promotion at the same time, there 
should be something like n-1 comparisons made and then that should be it.  So it seems 
like, depending on how frequently promotions co-occur for this retailer, the above 
should be somewhere between O(n) and O(n^2) time complexity.  Since these people are 
always having some sale or other, I suspect this won't be much worse than O(n log n).
 
Is there anything that is an obvious improvement in efficiency -- some clever tricks 
with the fold functions, some way of loading the data into some sort of convenient 
structure first, etc -- that I could easily implement?
 
Please share your thoughts.
 
Many thanks,
Jack Stecher
[EMAIL PROTECTED]
ڲG¥”&Ÿzf¢–)à–+-«$zYBi÷¡jÉ–Z+‚m§ÿðÃZ²G¥–Šàþf¢–f§þX¬¶)ߣøZ²G¥•Ɵ
Naive question on lists of duplicates

Reply via email to