2008/5/18 Matt Crane <[EMAIL PROTECTED]>:
> On Sun, May 18, 2008 at 8:52 PM, Robert Kern <[EMAIL PROTECTED]> wrote:
>> Are there repeats?
> No, no repeats in the first column.
>
> I'm going to go get a cup of coffee before I forget to leave out any
> potentially vital information again. It's going to be a long day.

It can be done, though I had to be kind of devious. My solution might
not even be O(n log n), depending on how mergesort is implemented:

def find_matches(A,B):
    """Find positions A_idx and B_idx so that A[A_idx]==B[B_idx]

    A and B should be sorted arrays with no repeats; A_idx and
    B_idx will be all the places they agree.

    >>> import numpy as np
    >>> A = np.array([1,2,4,5,7])
    >>> B = np.array([1,3,5,9])
    >>> find_matches(A,B)
    array([0,3]), array([0,2])
    """
    AB = np.concatenate((A,B))
    idx = np.argsort(np.concatenate((A,B)),kind="mergesort")
    sorted = AB[idx]
    pairs = np.where(np.diff(sorted)==0)[0]
    A_pairs = idx[pairs]
    B_pairs = idx[pairs+1]-len(A)

    if np.any(A_pairs>=len(A)):
        raise ValueError, "first array contains a repeated element"
    if np.any(B_pairs<0):
        raise ValueError, "second array contains a repeated element"

    return A_pairs, B_pairs

The idea is that diff is not a bad way to find repeats in a sequence;
so concatenate the two sequences and sort them (if you're lucky
mergesort will be fast because they're individually sorted, but in any
case we need stability). Of course, you have to take the pairs in the
sorted sequence and figure out where they came from, but fortunately
you don't even need to invert the permutation returned by argsort (do
we have a tool to invert permutations, or should one just use argsort
again?).

Anne
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to