-------- Original Message -------- Subject: Random Skewers with Matrix repeatability function Date: Sat, 24 Sep 2011 05:42:19 -0400 From: Milos Blagojevic <[email protected]> To: morphmet <[email protected]> Cheers to all morphometricians, I would like to share a three functions (randomSkewersT, Mrepeatability, and finalRSfun) that I wrote in R and which should be able to implement the random skewers method of Cheverud for matrix similarity estimation. This code is attached as a .r and .txt file to this message and can be used with any two matrices comprising of linear measurements or interlandmark distances. Its usage and ideas for creating some of the procedures is described in the code comments. Main issues are as follows: 1. RS generation procedure may be too uniform because of the element-by-element generation (other solutions I was unable to code-to-work) 2. Vector correlation (usage of R`s native cor function with use = "pairwise.complete.obs" for missing values gives off slightly different value than the one obtained by hand as the arc-cosine of the angle between normalized and centered response vectors). 3. Random vectors used for generation of the distribution against which to test mean vector correlation (their cor seems to be bounded well below -1, 1). 4. Matrix repeatability bootstrap with indices (again too uniform results with my data set) 5. Vector correlation adjustment (unusual values) These issues I was able to see while testing with my data set but I would appreciate independent test of this function to see if it really can provide valid results. Advice on how to code this properly are welcomed because I`m still poor R programmer (I always tend to use for loops when I`m sure there is R solution to it) and maybe I did not understand this method at all. Finally, I plan on adding modularity tests with adjacency matrices (Marroig and Cheverud, 2001) as well as evolvability estimations sensu Hansen and Houle, 2008 to complete the entire package which can be a part of CRAN and if there is anyone willing to participate let me know. Best regards, Milos Blagojevic, Faculty of Science, Kragujevac, Serbia
RSfunctionR.R
Description: Binary data
#generate random skewers with matrix tests
#numS is the number of skewers and qtS is the number of variables
#mat1 and mat2 are input V/CV matrices genrated from the data
#rvNum is the number of random vectors for the testing of the observed matrix
simmilarity
randomSkewersT <- function(mat1, mat2, numS, qtS, graph = FALSE, rvNum = 1000)
{
resSkewersCor <- numeric(numS)
skewers <- matrix(NA, ncol = numS, nrow = qtS)
#the looop for skewers generation from the uniform or normal distribution
#one-by-one element approach which can be substituted but for now gives best
results in randomness generation
for (i in 1:numS)
{
for (k in 1:qtS)
{
#r <- runif(1, min = -1, max = 1)
r <- rnorm(1, 0, 1)
skewers[k,i] <- r
}
}
eb <- qtS * rvNum
#normalization of skewers to a unit length of one
skewers <- skewers/sqrt(sum(skewers^2))
#completely random vectors for bootstrap-like comparison test of the observed
correlation
randVect <- matrix(rnorm(eb), nrow=qtS, ncol = rvNum)
randVect <- randVect/sqrt(sum(randVect * randVect))
randVectCor <- cor(randVect)
randVectCor <- randVectCor[col(randVectCor) < row(randVectCor)]
meanRVcor <- mean(randVectCor)
#application of skewers to both input matrices
matS1 <- mat1 %*% skewers
matS2 <- mat2 %*% skewers
#the problematic part of column-wise correlation between matrices after skewers
applied
for (i in 1:numS)
{
resSkewersCor[i] <- cor(matS1[,i], matS2[,i])
}
#graphing options
if (graph == TRUE)
{
hist(randVectCor, xlim = c(-1, 1))
re <- mean(resSkewersCor)
abline(v = re, lty = 3)
}
list(meanRVcor = meanRVcor, resSkewersCor = resSkewersCor, meanCor =
mean(resSkewersCor))
}
#Matrix repeatability function for a single matrix
Mrepeatability <- function(mat1, bootSize = 1000) #input not a VCV matrix
{
bootM <- numeric(bootSize)
rD <- dim(mat1)
#resampled matrix of original observations
bootM <- matrix(NA, nrow = rD[1], ncol = rD[2])
resultsCor <- numeric(bootSize)
#all possible indices
indice <- c(1:length(mat1))
#lapply for bootstrap resamples of indices as many as bootSize
resamplesIndices <- lapply(1:bootSize, function(i) sample(indice, replace =
T))
#generation of bootM resampled matrix and the subsequent calcucation of the
correlation with the original matrix (mat1)
#resCor is a randomSkewersT used for final comparison, and resultsCor is a
vector of RS correlations between resamples and original matrix
for (i in 1:bootSize)
{
for (k in 1:length(mat1)) {bootM[k] <- mat1[resamplesIndices[[i]][k]]}
bootVcv <- cov(bootM, use = "pairwise.complete.obs")
mat1vcv <- cov(mat1, use = "pairwise.complete.obs")
resCor <- randomSkewersT(mat1vcv, bootVcv, 100, 50, graph = FALSE)$meanCor
resultsCor[i] <- resCor
}
}
#the final function that should only be used and provide complete results based
on input matrices (not V/CV)
finalRSfun <- function(mat1, mat2, numSkew, numVar, bootSize = 50)
{
mat1vcv <- cov(mat1, use = "pairwise.complete.obs")
mat2vcv <- cov(mat2, use = "pairwise.complete.obs")
resS <- randomSkewersT(mat1vcv, mat2vcv, numS = numSkew, qtS = numVar)$meanCor
mat1R <- mean(Mrepeatability(mat1, bootSize = bootSize)) #mean of all is
used
mat2R <- mean(Mrepeatability(mat2, bootSize = bootSize))
rmax <- (mat1R * mat2R)/2
radj <- resS/rmax #unsure of this adjusment sensu Porto et al., 2009
list(randomSkewers = resS, repeatabilityMat1 = mat1R, repeatabilityMat2 =
mat2R, adjustedRep = radj)
}
