-------- Original Message --------
Subject:        Random Skewers with Matrix repeatability function
Date:   Sat, 24 Sep 2011 05:42:19 -0400
From:   Milos Blagojevic <[email protected]>
To:     morphmet <[email protected]>



Cheers to all morphometricians,


I would like to share a three functions (randomSkewersT, Mrepeatability,
and finalRSfun) that I wrote in R and which should be able to implement
the random skewers method of Cheverud for matrix similarity estimation.
This code is attached as a .r and .txt file to this message and can be
used with any two matrices comprising of linear measurements or
interlandmark distances. Its usage and ideas for creating some of the
procedures is described in the code comments. Main issues are as follows:


1. RS generation procedure may be too uniform because of the
element-by-element generation (other solutions I was unable to code-to-work)
2. Vector correlation (usage of R`s native cor function with use =
"pairwise.complete.obs" for missing values gives off slightly different
value than the one obtained by hand as the arc-cosine of the angle
between normalized and centered response vectors).
3. Random vectors used for generation of the distribution against which
to test mean vector correlation (their cor seems to be bounded well
below -1, 1).
4. Matrix repeatability bootstrap with indices (again too uniform
results with my data set)
5. Vector correlation adjustment (unusual values)


These issues I was able to see while testing with my data set but I
would appreciate independent test of this function to see if it really
can provide valid results. Advice on how to code this properly are
welcomed because I`m still poor R programmer (I always tend to use for
loops when I`m sure there is R solution to it) and maybe I did not
understand this method at all.


Finally, I plan on adding modularity tests with adjacency matrices
(Marroig and Cheverud, 2001) as well as evolvability estimations sensu
Hansen and Houle, 2008 to complete the entire package which can be a
part of CRAN and if there is anyone willing to participate let me know.


Best regards,


Milos Blagojevic,
Faculty of Science,
Kragujevac,
Serbia

Attachment: RSfunctionR.R
Description: Binary data

#generate random skewers with matrix tests
#numS is the number of skewers and qtS is the number of variables
#mat1 and mat2 are input V/CV matrices genrated from the data 
#rvNum is the number of random vectors for the testing of the observed matrix 
simmilarity

randomSkewersT <- function(mat1, mat2, numS, qtS, graph = FALSE, rvNum = 1000) 
{
  resSkewersCor <- numeric(numS)                                                
  
  skewers <- matrix(NA, ncol = numS, nrow = qtS)

#the looop for skewers generation from the uniform or normal distribution
#one-by-one element approach which can be substituted but for now gives best 
results in randomness generation

  for (i in 1:numS)
  {
    for (k in 1:qtS)
    {
      #r <- runif(1, min = -1, max = 1)
      r <- rnorm(1, 0, 1)
      skewers[k,i] <- r
    }
  }
  eb <- qtS * rvNum

#normalization of skewers to a unit length of one

  skewers <- skewers/sqrt(sum(skewers^2))

#completely random vectors for bootstrap-like comparison test of the observed 
correlation

  randVect <- matrix(rnorm(eb), nrow=qtS, ncol = rvNum)
  randVect <- randVect/sqrt(sum(randVect * randVect))
  randVectCor <- cor(randVect)
  randVectCor <- randVectCor[col(randVectCor) < row(randVectCor)]
  meanRVcor <- mean(randVectCor)

#application of skewers to both input matrices 

  matS1 <- mat1 %*% skewers
  matS2 <- mat2 %*% skewers

#the problematic part of column-wise correlation between matrices after skewers 
applied 

  for (i in 1:numS)
  {
    resSkewersCor[i] <- cor(matS1[,i], matS2[,i])
  }

#graphing options

  if (graph == TRUE)
  {
    hist(randVectCor, xlim = c(-1, 1))
    re <- mean(resSkewersCor)
    abline(v = re, lty = 3)
  }
  list(meanRVcor = meanRVcor, resSkewersCor = resSkewersCor, meanCor = 
mean(resSkewersCor))
}


#Matrix repeatability function for a single matrix

Mrepeatability <- function(mat1, bootSize = 1000) #input not a VCV matrix
{
  bootM <- numeric(bootSize)
  rD <- dim(mat1)

#resampled matrix of original observations

  bootM <- matrix(NA, nrow = rD[1], ncol = rD[2])
  resultsCor <- numeric(bootSize)

#all possible indices 

  indice <- c(1:length(mat1))

#lapply for bootstrap resamples of indices as many as bootSize

  resamplesIndices <- lapply(1:bootSize, function(i) sample(indice, replace = 
T))

#generation of bootM resampled matrix and the subsequent calcucation of the 
correlation with the original matrix (mat1)
#resCor is a randomSkewersT used for final comparison, and resultsCor is a 
vector of RS correlations between resamples and original matrix

  for (i in 1:bootSize)
  {
    for (k in 1:length(mat1)) {bootM[k] <- mat1[resamplesIndices[[i]][k]]}
    bootVcv <- cov(bootM, use = "pairwise.complete.obs")
    mat1vcv <- cov(mat1, use = "pairwise.complete.obs")
    resCor <- randomSkewersT(mat1vcv, bootVcv, 100, 50, graph = FALSE)$meanCor
    resultsCor[i] <- resCor
  }
}

#the final function that should only be used and provide complete results based 
on input matrices (not V/CV)

finalRSfun <- function(mat1, mat2, numSkew, numVar, bootSize = 50)
{
  mat1vcv <- cov(mat1, use = "pairwise.complete.obs")
  mat2vcv <- cov(mat2, use = "pairwise.complete.obs")
  resS <- randomSkewersT(mat1vcv, mat2vcv, numS = numSkew, qtS = numVar)$meanCor
  mat1R <- mean(Mrepeatability(mat1, bootSize = bootSize))  #mean of all is 
used 
  mat2R <- mean(Mrepeatability(mat2, bootSize = bootSize))
  rmax <- (mat1R * mat2R)/2
  radj <- resS/rmax #unsure of this adjusment sensu Porto et al., 2009
  list(randomSkewers = resS, repeatabilityMat1 = mat1R, repeatabilityMat2 = 
mat2R, adjustedRep = radj)
}

Reply via email to