Andrei Alexandrescu wrote:
Don wrote:
BCS wrote:
in sudocode
1. build a min heap out of the elements based on the first element of
each array
2. remove the first value from the root of the heap,
3. if empty array, drop it
4. if not empty re-heap
5. goto 2 if anything left
In practice, I bet it'll go faster with a couple more steps:
2. remove the first value (from array X) from the root of the heap,
2a. Determine new root of the heap (don't need to do a full re-heap yet).
2b. move all values from X until empty or >= new root.
4. reheap
Correct. Unfortunately my heap primitives didn't quite allow that so
currently I do it the "hard" way: pop from top, remove top from heap,
reinsert (if not empty). But this is definitely an optimization worth
doing. I think I'll implement it.
There are interesting cases like:
[5,6,7,8,9,9,......9]
[30, 31,32,33,34...38]
[1,1,1,1,1,1,1,1,1,1,...1]
[10,11,12,13,14,15,16,16,......., 16]
where in theory, the total number of comparisons required is a
function only of the number of arrays, and is independent of the
actual number of elements.
If it's common to have runs where consecutive values come from a
single array, and the arrays are random access, you could do MUCH
better than O(n).
I think you can't do better. Complexity is O(n) at a minimum because you
need to go through each element. The heap-based algorithm has O(n log
m). What you could improve on is the log m factor (m = the number of sets).
Depending on what "going through each element" means. Certainly, if you
need to move them, you can't beat O(n) moves, but you can definitely get
O(m*m) < < O(n) comparisons in the best case I described. And that would
also be the number of splicing operations.
You could approximate this without hurting the worst-case too much, by
(for example) once you get a run of more than K entries from the same
array, step forward 2 entries; if that's still less than the root, copy
both to the output, then step forward 4 entries, etc.
That should give a best-case O(m log n) comparisons/splices.
Which might not be of any practical use, of course.