I think I am making this problem harder than it has to be and so I keep getting stuck on what might be a trivial problem. I have used the seqinr package to load a protein sequence alignment containing 15 protein sequences; > library(seqinr) > x = read.alignment("proteins.fasta",format="fasta",forceToLower=FALSE)This automatically loads in a list of 4 elements including the sequences and other information. I store the sequences to a new list; > mylist = x$seqwhich returns a character vector of 15 strings. I have found that if I split the long character strings into individual characters it is easy to use lapply to loop over this list. So I use strsplit; >list.2 = strsplit(mylist, split = NULL) >From this list I can determine which proteins have changes at certain >positions by using; >lapply(list.2, "[", 10) == "L"This returns a logical T/F vector for those elements of the list that do/do not the letter L at position 10. Because each of the protein sequences contains 99amino acids, I want to automate this process so that I do not have to compare/contrast positions 1 x 1. Most of the changes occur between positions/letters 10-95. I have a standard character vector that I wish to use for comparison when looping through the list. Should I perhaps combine all -- the standard "letter"/aa vector, the list of protein sequences -- into one list? Or is it better to leave them separate for this comparison? I'm not sure what the output should be as I need to use it for another statistical test. Would a list of logical vectors be the most sufficient output to return? [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.