Re: Naive question on lists of duplicates

Sarah Thompson Thu, 05 Jun 2003 23:37:57 -0700

I'm pretty confident that this will be more efficient than my colleague's SAS code, as he was comparing each record to every other record (giving n (n-1) comparisons). It seems like this, in the worst case where everything is on promotion at distinct times, will compare the first record to (n-1) records, the second to (n-2) records, etc., giving n (n-1)/2 comparisons. Thus, while this is worst-case O(n^2), it seems like it should have at most half as much work to do as the earlier approach in SAS. On the other hand, in the best case, when everything is on promotion at the same time, there should be something like n-1 comparisons made and then that should be it. So it seems like, depending on how frequently promotions co-occur for this retailer, the above should be somewhere between O(n) and O(n^2) time complexity. Since these people are always having some sale or other, I suspect this won't be much worse than O(n log n).

Is there anything that is an obvious improvement in efficiency -- some clever tricks with the fold functions, some way of loading the data into some sort of convenient structure first, etc -- that I could easily implement?

Firstly I'm assuming that you are working with a granularity of days, and that each promotion will always have a relatively small maximum number of days. If so, how about something like the following:

1: Filter the existing data structure, resulting in a lazy list of tuples (a, b) where a is a day number and b is an identifier for the promotion. Where a promotion spans n days, the list will contain n entries, one for each day of the promotion. Complexity is O(M x N), where M is the (small) maximum number of days. If this is fixed *and* small, we can discard this any regard the complexity as just O(N).

2: Sort the list with a as the primary key and b as the secondary key. Complexity should be near enough O(N log N)

3: Traverse the results of the sort, outputting a lazy list of lists such that the elements of each sub-list are references to the promotions that overlap for one specific day number. Where no overlap is detected for one specific day, that day can simply be ignored. Complexity should be linear.

4: Sort this list, and discard duplicates. Complexity should be O(N log N) for the sort and O(N) for the 'uniq'.

5: You are now left with a list which describes all overlapping promotions.

Total complexity should effectively be O(N + N log N + N + N log N + N) which of course just collapses to O(N log N).

--
     ----------------------------------------------
    / __            + / Sarah Thompson      **** /
   / (_  _  _ _ |_   /                        * /
  /  __)(_|| (_|| ) / [EMAIL PROTECTED]      * /
 / +               / http://findatlantis.com/ /
----------------------------------------------


_______________________________________________
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Naive question on lists of duplicates

Reply via email to