Have you done a search of "www.r-project.org" -> search -> "R site search" for "Markov Chain"? I just got "138 documents matching your query". The fifth one suggested "chapter 5 of Jim Lindsey's online document 'The statistical analysis of stochastic processes in Time', at his website www.luc.ac.be/~jlindsey". I found this document mentioned under "recent publications". The book may no longer be downloadable , but his examples still are.

There are probably other tools of interest to you in that list, and perhaps someone else will enlighten both of us on this.

There may be an easier way to do what you ask, if I understand your question correctly, the following seems to do it for me:

bases <- c("A","C","G","T")
sgn <- c("+", "-")

signedBases <- as.vector(
    outer(bases, sgn, paste, sep=""))
sBnum <- 1:8
names(sBnum) <- signedBases
set.seed(1)
seqLen <- 100
sBaseSeq <- sample(x=signedBases,
           size=seqLen, replace=TRUE)

nextBase <- aggregate(sBaseSeq[-seqLen],
     list(thisBase=sBaseSeq[-seqLen],
          nextBase=sBaseSeq[-1]), length)
transFreq <- array(0, dim=c(8,8))
dimnames(transFreq) <- list(signedBases,
                           signedBases)
nBnum <- array(
   sBnum[as.matrix(nextBase[1:2])],
              dim=dim(nextBase[1:2]))

transFreq[nBnum]<- nextBase[[3]]

> transFreq
  A+ C+ G+ T+ A- C- G- T-
A+  1  2  1  2  0  2  0  1
C+  2  3  1  0  0  3  1  1
G+  0  0  2  5  2  1  2  0
T+  1  2  2  1  1  3  8  2
A-  0  0  0  1  1  1  1  1
C-  2  1  1  5  0  2  2  2
G-  3  1  2  4  2  2  1  2
T-  0  2  2  2  0  1  2  1

     hope this helps.  spencer graves

dax42 wrote:

Hello,

I have got the following problem:
given is a large string sequence consisting of the four letters "A" "C" "G" and "T" (as before). Additionally, I have got a second string sequence of the same length giving a label for each character. The labels are "+" and "-".


Now I would like to create an 8x8 matrix which contains the numbers on how often we see all possible pairwise combinations, for example "A" with the label "+" followed by "C" with the label "+" or "T"->"C" with the labels "-"->"+" etc.

Of course I can just use loops to "walk" along the sequence, but as you have shown me so much better solutions in response to my last mail, I thought you might be able to help and improve my R skills even further ..

Thanks for your ideas!
Cheers, Winnie

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to