Peter Otten wrote: > MooMaster wrote: > >> Now we can't calculate a meaningful Euclidean distance for something >> like "Iris-setosa" and "Iris-versicolor" unless we use string-edit >> distance or something overly complicated, so instead we'll use a >> simple quantization scheme of enumerating the set of values within the >> column domain and replacing the strings with numbers (i.e. Iris-setosa >> = 1, iris-versicolor=2). > > I'd calculate the distance as > > def string_dist(x, y, weight=1): > return weight * (x == y)
oops, this must of course be (x != y). > You don't get a high resolution in that dimension, but you don't introduce > an element of randomness, either. > > Peter > -- http://mail.python.org/mailman/listinfo/python-list