I was asked in private, but reply in public, so others can also find this answer in the future:
On Fri, Apr 8, 2022 at 1:11 PM ..... wrote : > Hello > dear Dr. Maechler > I have a question about "pam" function in the cluster package. In this > function, we choose one of the euclidean or manhattan distances to > calculate dissimilarity but in the mixed typed data sets the true index may > be jaccard or other indicators. > How can we allocate the "true" metric for each variable? > Best regards > yes, you can use pam() use in two ways; see this part of the help page : Arguments: x: data matrix or data frame, or dissimilarity matrix or object, depending on the value of the ‘diss’ argument. In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) _are_ allowed-as long as every pair of observations has at least one case not missing. In case of a dissimilarity matrix, ‘x’ is typically the output of daisy or dist. Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions. Missing values (NAs) are _not_ allowed. So, you can first use dx <- daisy(x, ...) and use the correct distance between your observational units, After that you can use the computed distance / dissimilarity matrix (the `dx`) in you call to pam(): px <- pam(dx, k=., ....) I hope this helps you. With best regards, Martin -- Martin Maechler ETH Zurich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.