Re: [R] caculate the frequencies of the Amino Acids
I sent a private reply to the poster yesterday with the solution to his problem, bit evidently he hasn't seen it, so ... # Function to tabulate an amino acid sequence # using strsplit and table: AAtab2 <- function(x) { # Creates a frequency table of the characters in a string AA <- LETTERS[-c(2, 10, 15, 21, 24, 26)] n <- nchar(x) s <- substring(x, 1:n, 1:n) table(factor(s, levels = AA)) } AAtabMatrix <- function(x) { # Input x is an amino acid sequence as a character string if(!is.character(x)) stop('Input must be a character string or vector.') do.call(rbind, lapply(as.list(x), AAtab2)) } cheseq <- scan(url('http://n4.nabble.com/file/n997581/sequence.txt'), what = '') > AAtabMatrix(cheseq) A C D E F G H I K L M N P Q R S T V W Y [1,] 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [2,] 21 0 25 28 27 24 0 34 39 31 11 20 16 10 17 25 22 33 3 15 [3,] 34 5 15 25 6 35 7 24 23 32 9 12 15 10 17 14 13 36 2 13 [4,] 33 5 17 24 7 36 7 24 24 32 9 13 14 9 17 12 14 36 2 12 [5,] 33 5 16 25 5 35 6 24 23 33 8 12 15 9 17 17 12 35 2 15 [6,] 33 4 15 6 21 30 3 19 23 22 8 8 8 14 17 14 12 24 5 12 [7,] 30 3 13 4 16 22 2 17 16 17 6 6 7 11 15 11 12 18 3 11 [8,] 39 5 21 8 22 39 2 23 29 25 10 8 7 13 22 14 21 25 7 16 [9,] 34 4 17 6 19 30 2 20 24 21 8 7 7 12 17 14 16 21 5 14 [10,] 35 4 17 6 18 31 3 20 23 21 8 7 7 12 18 12 17 21 5 13 Each row represents the frequency table of an individual AA sequence. DM On Sat, Jan 2, 2010 at 9:28 PM, che wrote: > > Thanks very much the code is working perfectly, but I hope guys that you > can > help me to do the same thing but by using the loop structure, i want to > know > if i am doing right, i want to use the loop structure to scan each sequence > from the file sequence.txt (the file is attached) to get the frequency for > each Amino Acid, and i wrote the following code so far, and i stopped, got > confused, specially that i am a very beginner in R > http://n4.nabble.com/file/n997581/sequence.txt sequence.txt : > x<-read.table("sequence.txt",header=FALSE) > > AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') > > test<-nchar(as.character(x$V1[i])) > frequency<-function(X) > { > y<-rep(0,20) > for(j in 1:test){ > for(i in 1:nrow(x)){ >res<-which(AA==substr(x$V1[i],j,j)) >y[res]=y[res]+1 >} >} > return(y) > } > So how to fix this code, how to give the life for the i and the j in > order to initiate the indexing. Sorry for bothering you guys. > > > che wrote: > > > > may some one please help me to sort this out, i am trying to writ a R > code > > for calculating the frequencies of the amino acids in 9 different > > sequences, i want the code to read the sequence from external text file, > i > > used the following code to do so: > > x<-read.table("sequence.txt",header=FALSE) > > > > then i defined an array for 20 amino acids as following: > > > AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') > > i am using the following code to calculate the frequencies: > > > > frequency<-function(X) > > { > > y<-rep(0,20) > > for(j in 1:nchar(as.character(x$V1[i]))){ > > for(i in 1:9){ > > > > res<-which(AA==substr(x$V1[i],j,j)) > > y[res]=y[res]+1 > > } > > } > > return(y) > > } > > > > but this code actually is not working, it reads only one sequence, i dont > > know why the loop is not working for the "i", which suppose to read the > > nine rows of the file sequence.txt. the sequence.txt file is attached to > > this message. > > > > cheers > > http://n4.nabble.com/file/n997072/sequence.txt sequence.txt > > > > -- > View this message in context: > http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997581.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
On Jan 3, 2010, at 12:28 AM, che wrote: Thanks very much the code is working perfectly, but I hope guys that you can help me to do the same thing but by using the loop structure, i want to know if i am doing right, i want to use the loop structure to scan each sequence from the file sequence.txt (the file is attached) to get the frequency for each Amino Acid, and i wrote the following code so far, and i stopped, got confused, specially that i am a very beginner in R http://n4.nabble.com/file/n997581/sequence.txt sequence.txt : x<-read.table("sequence.txt",header=FALSE) AA<- c ('A ','C ','D ','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') test<-nchar(as.character(x$V1[i])) frequency<-function(X) { y<-rep(0,20) I earlier pointed out that such a structure would be inadequate to hold the tabulation of more than one sequence. You probably need a matrix of "width" = 20 and "depth" = the number of your sequences. for(j in 1:test){ for(i in 1:nrow(x)){ res<-which(AA==substr(x$V1[i],j,j)) y[res]=y[res]+1 ... and here you will need to index y[ , ] with both the proper row and column. } } return(y) } So how to fix this code, how to give the life for the “i” and the “j” in order to initiate the indexing. Sorry for bothering you guys. -- David. che wrote: may some one please help me to sort this out, i am trying to writ a R code for calculating the frequencies of the amino acids in 9 different sequences, i want the code to read the sequence from external text file, i used the following code to do so: x<-read.table("sequence.txt",header=FALSE) then i defined an array for 20 amino acids as following: AA<- c ('A ','C ','D ','E ','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') i am using the following code to calculate the frequencies: frequency<-function(X) { y<-rep(0,20) for(j in 1:nchar(as.character(x$V1[i]))){ for(i in 1:9){ res<-which(AA==substr(x$V1[i],j,j)) y[res]=y[res]+1 } } return(y) } but this code actually is not working, it reads only one sequence, i dont know why the loop is not working for the "i", which suppose to read the nine rows of the file sequence.txt. the sequence.txt file is attached to this message. cheers http://n4.nabble.com/file/n997072/sequence.txt sequence.txt -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997581.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
Thanks very much the code is working perfectly, but I hope guys that you can help me to do the same thing but by using the loop structure, i want to know if i am doing right, i want to use the loop structure to scan each sequence from the file sequence.txt (the file is attached) to get the frequency for each Amino Acid, and i wrote the following code so far, and i stopped, got confused, specially that i am a very beginner in R http://n4.nabble.com/file/n997581/sequence.txt sequence.txt : x<-read.table("sequence.txt",header=FALSE) AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') test<-nchar(as.character(x$V1[i])) frequency<-function(X) { y<-rep(0,20) for(j in 1:test){ for(i in 1:nrow(x)){ res<-which(AA==substr(x$V1[i],j,j)) y[res]=y[res]+1 } } return(y) } So how to fix this code, how to give the life for the “i” and the “j” in order to initiate the indexing. Sorry for bothering you guys. che wrote: > > may some one please help me to sort this out, i am trying to writ a R code > for calculating the frequencies of the amino acids in 9 different > sequences, i want the code to read the sequence from external text file, i > used the following code to do so: > x<-read.table("sequence.txt",header=FALSE) > > then i defined an array for 20 amino acids as following: > AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') > i am using the following code to calculate the frequencies: > > frequency<-function(X) > { > y<-rep(0,20) > for(j in 1:nchar(as.character(x$V1[i]))){ > for(i in 1:9){ > > res<-which(AA==substr(x$V1[i],j,j)) > y[res]=y[res]+1 > } > } > return(y) > } > > but this code actually is not working, it reads only one sequence, i dont > know why the loop is not working for the "i", which suppose to read the > nine rows of the file sequence.txt. the sequence.txt file is attached to > this message. > > cheers > http://n4.nabble.com/file/n997072/sequence.txt sequence.txt > -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997581.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
Hi fadialnaji, Take a look at the Biostring package in Bioconductor [1] It might be an alternative to do what you want. HTH, Jorge [1] http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html On Fri, Jan 1, 2010 at 11:59 PM, che <> wrote: > > may some one please help me to sort this out, i am trying to writ a R code > for calculating the frequencies of the amino acids in 9 different > sequences, > i want the code to read the sequence from external text file, i used the > following code to do so: > x<-read.table("sequence.txt",header=FALSE) > > then i defined an array for 20 amino acids as following: > > AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') > i am using the following code to calculate the frequencies: > > frequency<-function(X) > { > y<-rep(0,20) > for(j in 1:nchar(as.character(x$V1[i]))){ > for(i in 1:9){ > >res<-which(AA==substr(x$V1[i],j,j)) >y[res]=y[res]+1 >} >} > return(y) > } > > but this code actually is not working, it reads only one sequence, i dont > know why the loop is not working for the "i", which suppose to read the > nine > rows of the file sequence.txt. the sequence.txt file is attached to this > message. > > cheers > http://n4.nabble.com/file/n997072/sequence.txt sequence.txt > -- > View this message in context: > http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997072.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
On Jan 2, 2010, at 12:55 AM, che wrote: i know it would be better to ask R to make the data, but i need to sequence this particular file, because it is data for some Amino Acids and i cant play with, so i need to ask R to go through the sequence one by one, and then give me the numbers of each letters of each sequence, i am quite confused between using "i" and "j" and how to iterate both of them and make them work functionally. i attached the sequence.txt with my original message, and i will attach it here in case. thanks for your help. http://n4.nabble.com/file/n997087/sequence.txt sequence.txt Sorry. I did not read to the very end. My apologies, hopefully the following oneliner will make up for my dereliction of attention. che wrote: may some one please help me to sort this out, i am trying to writ a R code for calculating the frequencies of the amino acids in 9 different sequences, i want the code to read the sequence from external text file, i used the following code to do so: x<-read.table("sequence.txt",header=FALSE) then i defined an array for 20 amino acids as following: AA<- c ('A ','C ','D ','E ','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') i am using the following code to calculate the frequencies: After copy-pasting the sequences from a browser window to a character object, "seqnc", I then processed it: > seqlines <- readLines(textConnection(seqnc)) # Then for the first sequence: > table(strsplit(seqlines[1], vector()) ) A D E F G I K L M N P Q R S T V W Y 21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33 3 15 # For "mass production": The names that resulted from my first effort were a bit unwieldly ( > 200 characters long) so I unnamed it: unname( sapply(seqlines, function(x) table(strsplit(x, vector() ) ) ) ) [[1]] A D E F G I K L M N P Q R S T V W Y 21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33 3 15 [[2]] A C D E F G H I K L M N P Q R S T V W Y 34 5 15 25 6 35 7 24 23 32 9 12 15 10 17 14 13 36 2 13 [[3]] A C D E F G H I K L M N P Q R S T V W Y 33 5 17 24 7 36 7 24 24 32 9 13 14 9 17 12 14 36 2 12 [[4]] A C D E F G H I K L M N P Q R S T V W Y 33 5 16 25 5 35 6 24 23 33 8 12 15 9 17 17 12 35 2 15 [[5]] A C D E F G H I K L M N P Q R S T V W Y 33 4 15 6 21 30 3 19 23 22 8 8 8 14 17 14 12 24 5 12 [[6]] A C D E F G H I K L M N P Q R S T V W Y 30 3 13 4 16 22 2 17 16 17 6 6 7 11 15 11 12 18 3 11 [[7]] A C D E F G H I K L M N P Q R S T V W Y 39 5 21 8 22 39 2 23 29 25 10 8 7 13 22 14 21 25 7 16 [[8]] A C D E F G H I K L M N P Q R S T V W Y 34 4 17 6 19 30 2 20 24 21 8 7 7 12 17 14 16 21 5 14 [[9]] A C D E F G H I K L M N P Q R S T V W Y 35 4 17 6 18 31 3 20 23 21 8 7 7 12 18 12 17 21 5 13 [[10]] A 5 -- David. frequency<-function(X) { y<-rep(0,20) for(j in 1:nchar(as.character(x$V1[i]))){ for(i in 1:9){ res<-which(AA==substr(x$V1[i],j,j)) y[res]=y[res]+1 } } return(y) } but this code actually is not working, it reads only one sequence, i dont know why the loop is not working for the "i", which suppose to read the nine rows of the file sequence.txt. the sequence.txt file is attached to this message. cheers http://n4.nabble.com/file/n997072/sequence.txt sequence.txt -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
i know it would be better to ask R to make the data, but i need to sequence this particular file, because it is data for some Amino Acids and i cant play with, so i need to ask R to go through the sequence one by one, and then give me the numbers of each letters of each sequence, i am quite confused between using "i" and "j" and how to iterate both of them and make them work functionally. i attached the sequence.txt with my original message, and i will attach it here in case. thanks for your help. http://n4.nabble.com/file/n997087/sequence.txt sequence.txt che wrote: > > may some one please help me to sort this out, i am trying to writ a R code > for calculating the frequencies of the amino acids in 9 different > sequences, i want the code to read the sequence from external text file, i > used the following code to do so: > x<-read.table("sequence.txt",header=FALSE) > > then i defined an array for 20 amino acids as following: > AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') > i am using the following code to calculate the frequencies: > > frequency<-function(X) > { > y<-rep(0,20) > for(j in 1:nchar(as.character(x$V1[i]))){ > for(i in 1:9){ > > res<-which(AA==substr(x$V1[i],j,j)) > y[res]=y[res]+1 > } > } > return(y) > } > > but this code actually is not working, it reads only one sequence, i dont > know why the loop is not working for the "i", which suppose to read the > nine rows of the file sequence.txt. the sequence.txt file is attached to > this message. > > cheers > http://n4.nabble.com/file/n997072/sequence.txt sequence.txt > -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
On Jan 2, 2010, at 12:26 AM, David Winsemius wrote: On Jan 1, 2010, at 11:59 PM, che wrote: may some one please help me to sort this out, i am trying to writ a R code for calculating the frequencies of the amino acids in 9 different sequences, i want the code to read the sequence from external text file, i used the following code to do so: x<-read.table("sequence.txt",header=FALSE) then i defined an array for 20 amino acids as following: AA<- c ('A ','C ','D ','E ','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') i am using the following code to calculate the frequencies: frequency<-function(X) { y<-rep(0,20) Further thoughts: If I understand how you are doing this, that structure would only be large enough for one string's results. (Seems like this process would be easier if you used table() instead of for-loops.) for(j in 1:nchar(as.character(x$V1[i]))){ # at this point you are referencing "i" but it is not yet being iterated and might not even exist. # did you mean "j"? # also might be safer to use seq_along() for(i in 1:9){ res<-which(AA==substr(x$V1[i],j,j)) # Is that really working for even one sequence? Without an "x" sequence I cannot test, but it "looks wrong". y[res]=y[res]+1 } } return(y) } but this code actually is not working, it reads only one sequence, i dont know why the loop is not working for the "i", which suppose to read the nine rows of the file sequence.txt. the sequence.txt file is attached to this message. cheers http://n4.nabble.com/file/n997072/sequence.txt sequence.txt -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] caculate the frequencies of the Amino Acids
On Jan 1, 2010, at 11:59 PM, che wrote: may some one please help me to sort this out, i am trying to writ a R code for calculating the frequencies of the amino acids in 9 different sequences, i want the code to read the sequence from external text file, i used the following code to do so: x<-read.table("sequence.txt",header=FALSE) then i defined an array for 20 amino acids as following: AA<- c ('A ','C ','D ','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') i am using the following code to calculate the frequencies: frequency<-function(X) { y<-rep(0,20) for(j in 1:nchar(as.character(x$V1[i]))){ # at this point you are referencing "i" but it is not yet being iterated and might not even exist. # did you mean "j"? # also might be safer to use seq_along() for(i in 1:9){ res<-which(AA==substr(x$V1[i],j,j)) # Is that really working for even one sequence? Without an "x" sequence I cannot test, but it "looks wrong". y[res]=y[res]+1 } } return(y) } but this code actually is not working, it reads only one sequence, i dont know why the loop is not working for the "i", which suppose to read the nine rows of the file sequence.txt. the sequence.txt file is attached to this message. cheers http://n4.nabble.com/file/n997072/sequence.txt sequence.txt -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] caculate the frequencies of the Amino Acids
may some one please help me to sort this out, i am trying to writ a R code for calculating the frequencies of the amino acids in 9 different sequences, i want the code to read the sequence from external text file, i used the following code to do so: x<-read.table("sequence.txt",header=FALSE) then i defined an array for 20 amino acids as following: AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y') i am using the following code to calculate the frequencies: frequency<-function(X) { y<-rep(0,20) for(j in 1:nchar(as.character(x$V1[i]))){ for(i in 1:9){ res<-which(AA==substr(x$V1[i],j,j)) y[res]=y[res]+1 } } return(y) } but this code actually is not working, it reads only one sequence, i dont know why the loop is not working for the "i", which suppose to read the nine rows of the file sequence.txt. the sequence.txt file is attached to this message. cheers http://n4.nabble.com/file/n997072/sequence.txt sequence.txt -- View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.