On Tue, 2006-04-18 at 22:06 -0400, Tyler Smith wrote: > I'm trying to do a non-metric multidimensional scaling using isoMDS. > However, I have some '0' distances in my data, and I'm not sure how to > deal with them. I'd rather not drop rows from the original data, as I am > comparing several datasets (morphology and molecular data) for the same > individuals, and it's interesting to see how much morphological > variation can be associated with an identical genotype. > > I've tried replacing the 0's with NA, but the isoMDS appears to stop on > the first iteration and the stress does not improve: > > distA # A dist object with 13695 elements, 4 of which == 0 > cmdsA <- cmdscale(distA, k=2) > > distB <- distA > distB[which(distB==0)] <- NA > > isoA <- isoMDS(distB, cmdsA) > initial value 21.835691 > final value 21.835691 > converged > > The other approach I've tried is replacing the 0's with small numbers. > In this case isoMDS does reduce the stress values. > > min(distA[which(distA>0)]) > [1] 0.02325581 > > distC <- distA > distC[which(distC==0)] <- 0.001 > isoC <- isoMDS(distC) > initial value 21.682854 > iter 5 value 16.862093 > iter 10 value 16.451800 > final value 16.339224 > converged > > So my questions are: what am I doing wrong in the first example? Why > does isoMDS converge without doing anything? Is replacing the 0's with > small numbers an appropriate alternative? > Tyler,
My experience is that isoMDS *may* fail to go away from the starting configuration if there are identical values in initial configuration, and this will happen if you use cmdscale() to get the initial configuration. You *may* get over this by shifting duplicates a bit: > con <- cmdscale(dis) > dups <- duplicated(con) > sum(dups) [1] 2 > con[dups, ] <- con[dups,] + runif(2*sum(dups), -0.01, 0.01) Then isoMDS may go further. Another issue is that at a quick look isoMDS() seems to do nothing sensible with missing values, although it accepts them. The only thing is that they are ordered last, or regarded as very long distances (in your case they rather should be regarded as very short distances). The keylines in isoMDS are: ord <- order(dis) nd <- sum(!is.na(ord)) Even when 'dis' has missing values, the result of order() ('ord') has no missing values, but with default argument na.last=TRUE they are put last in the list. An obvious looking change would be to replace the second line with: nd <- sum(!is.na(dis)) but this "dumps the core" of R at least in my machine: probably you need the full length of vectors also in addition to number of non-missing entries. (This quick look was based on the latest release version of MASS/VR: there may be a newer version already with the upcoming R release, but that's not released yet.) You may check working with NA: are duplicate points identical in results? Then about replacing zero distances with a tiny number: this has been discussed before in this list, and Ripley said "no, no!". I do it all the time, but only in secrecy. A suggested solution was to drop duplicates, but then there still is a weighting issue, and isoMDS does not have weights argument. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html