--- "Jackson, Harry" <[EMAIL PROTECTED]> wrote: >
> I have been looking back at this problem and here is what I have found.
>
> Lets take the following set of times
> A , B
> 1 (4 , 5)
> 2 (9 , 10)
> 3 (11 , 12)
> 4 (12 , 14)
> 5 (12 , 18)
> 6 (14 , 15)
>
> If we sort by column A the set inherits the following property or Rule
>
> a1 <= a2 is always true
>
> This means that we are only interested b1 properties in relation to a2 and
> b2. If (b1 >= b2) then because of Rule 1 we can ignore the values in the
> second set and remove them.
This seems wrong, why (b1 >= b2)? That's an assertion we aren't looking
backwards in time (lets assume that the input has been checked for this
- it makes the calculations easier (I could solve it with a double pass)).
If I assume you meant (a1 >= b2), then it is not valid:
> A , B
> 1 (4 , 5)
> 2 (9 , 10)
Which is already sorted on A, now we can see that if a1>=b2 should never be
true. So, lets assume (a2 >= b1):
> A , B
> 1 (4 , 5)
> 2 (9 , 10)
Now, if 5 > 9, then we could merge these (in this case they aren't). With
the new set being (4, 10) [if 5 > 9!]
> If this is not the case then if (b1 < a2) we can safely deduce that it has
> no overlap in any other set from rule 1 and calculate the time and store it.
I think this might be true, proving you have collapsed the above. It warrents
someone better than I at Algorithms to look at.
> If however (b1 < a2) is false then because of Rule 1 we can safely replace
> b1 with b2 and remove the second set.
I'm lost. :P
> The code I have used to get the results is at the end. I have no idea if it
> is a fast solution. I am not sure if it is the best way to code it either
> and I am sure some people can optimise it for best performance. I wanted to
> use a for loop but went for the easier while loop instead.
If you only have to sort once, and you can step through in a linear fashion
then it will be pretty fast. It is not dissimilar to what I did (and nobody
has complained yet), it does however have an unfortunate side effect:
the original sample of data gets destroyed
but that is easily fixable. The time complexity of our solutions is going
to be the same [I think], however improvements can be made to make it even
more efficient - if we preprocess the input data, which is not unlikely.
Imagine a calander with associated timeslices, where you can do some
preprocessing and even store to disk.
I was going to try using an AVL tree, but someone hasn't written Tree::AVL
yet! I don't have time just now, so it's going to have to be on the back
burner for the moment.
> Some other considerations that may have to be considered if it is going to
> become a module.
Some of which I've covered.
> For thousands of records if we find the greatest stop time then if our
> searching set every has this value as its stop time we can calculate the
> total and exit without iterating over any more sets providing the set has
> been sorted on column A.
Huh?
> When we have sets with matching start times we are only interested in the
> set with the largest difference and vice versa for the stop times.
Time complexity to detect that case is poor, is it not?
Jonathan Paton
=====
s''-//--/\\///|-\/\|--\--/-\-/\-//\-|/\\\|/\///|-\--\\\\',
s''/-\\\/|///|-|/|/--\--/--//\|\/\||/|/-/\\\-/\///|-\-\-',
y'|\/-'3210',$_=join qq\\,map{s|2|10|||s|3|11|||s|^|0|;$_}
m|.|g;map{print chr unpack'N',pack'B32','0'x24 .$_}/.{8}/g
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]