Dear R users, I wrote a simple script to change the header lines in a fasta file that contains DNA sequences in a format:
>header1 sequence1 >header2 sequence2 I am basically trying to replace the "header" in this file with a line from another file (taxonomy file). In order to do that I have to find the matching header in the taxonomy file. The output should be in fasta format and it is, but the rows repeat so the output file is huge and it looks like: >header1 sequence1 >header1 sequence1 >header2 sequence2 The code I have is: tax=read.table("taxonomy_file.txt", header=F, quote="", sep="\t") tax2=data.frame(tax) library("Biostrings") seqs=readDNAStringSet("File.fasta") header=names(seqs) seqs2=paste(seqs) new.final=NULL i=1 #Go through tax file and match the header in tax file to header in seqs file for(i in 1:length(tax[,1])){ sampleID=NULL match=NULL sampleID=as.character(tax2[i,1]) #sample ID in taxonomy header match=which(sampleID==header) #index for match in header file if(match>0){ newH1=NULL newH2=NULL seqline=NULL new.header=NULL newH1=as.character(tax2[i,1]) newH2=as.character(tax2[i,2]) seqline=seqs2[match] new.header=paste(">",newH1,"|",newH2, sep="") new.final=rbind(new.final, new.header, seqline) } print(paste("percent complete =", round((i/length(tax2[,1]))*100,3), "%",sep=" ")) write.table(new.final, file="Test_output.txt", quote=FALSE, sep="\n", col.names=FALSE, row.names=FALSE, append=TRUE) i=i+1 } Something about rbind is repeating all of the rows every time it writes to the output file. I have not been able to find anything about this online or in the r help for rbind, although perhaps I am missing something obvious about this. I greatly appreciate any help with this! [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.